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MATHEMATICAL BIOLOGY OF SOCIAL BEHAVIOR: IV. 
IMITATION EFFECTS AS A FUNCTION 
OF DISTANCE 


N. RASHEVSKY 
COMMITTEE ON MATHEMATICAL BIOLOGY 
THE UNIVERSITY OF CHICAGO 


In previous publications, social groups have been studied in which 
each individual has a preference for one of two possible mutually ex- 
clusive activities. This preference is measured by a quantity #. The 
value ¢ = 0 corresponds to no preference; a preference for one ac- 
tivity is measured by a positive ¢, the preference for the other by a 
negative ¢. The quantity ¢ varies from individual to individual. It has 
been shown previously that, owing to effects of imitation, even when 
the average ¢ for the group is zero, one of the two behaviors will be 
chosen by the majority of the group. Whereas in previous studies the 
imitation effect was considered as independent of the distance between 
the imitating and imitated individuals, in the present study the case is 
considered in which the effect of imitation decreases with the distance 
between the individuals. It is found that under certain conditions a 
greater percentage near the center of the area occupied by the group, 
rather than near the periphery, exhibits the chosen behavior. The pos- 
sible sociological meaning of this gradient of behavior is discussed. 


This paper is a generalization of previous studies (Rashevsky, 
1949; 1950) and familiarity of the reader with either one of the 
above references is essential. In those previous publications we con- 
sidered the effect of imitation of one individual by another as being 
independent of the physical distance between the two individuals. 
Here we shall consider the more general case, in which imitation is 
a function of distance. The most direct cause of the dependence of 
imitation upon distance may lie in the circumstance that the farther 
away the two individuals are geographically, the less frequently they 
are likely to see each other. In a properly developed theory, the func- 
tion which determines the dependence of imitation on distance should 
be derived from such considerations. It will involve as parameters 
such quantities as the physical mobility of the individuals, which de- 
pends on methods of transportation, and the amount of communica- 
tion transmissible from one individual to another, as a function of 
their distance. Leaving these possibilities to the future, we shall in- 
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vestigate here a very special case, in which a function of distance is 
chosen so as to make our equations soluble. This at least shows some 
general possible properties of such situations. 

We shall use the same notations as in loc. cit., except that we 
shall denote now by « and y the geographic coordinates of an individ- 
ual. For the number of those individuals with a given ¢ who exhibit re- 
spectively behaviors R, and R, we shall now use n(¢) and m() re- 
spectively. We shall assume a constant density of population. 

If the imitation depends on the distance, then, in general, the 
quantity y will be a function of the coordinates x, y of the individ- 
ual. Denote by X(a’, y’) dx’ dy’ the number: of those individuals 
whose coordinates lie in the interval (2’,7’ + da’) and (y’,y’ + dy’) 
and who exhibit behavior R,. Denote by Y(2’,y’) dx’ dy’ the cor- 
responding quantity for behavior R.. Let K(a,y; zy’) = K(ru) = 
K[(e¢ — «#2 + (y—y’')?] be a function of the distance 


fas Ve—2 2 +y—y)? 


between the two points x,y and 2’,y’. The function K(7.) is sym- 
metric with respect to the points x,y and 2’,y’, so that 


K (ay; x,y’) = K(x',y’; x,y). (1) 


Generalizing the concept used in loc. cit. we consider that the 
quantity 


LX (x,y) —Y(a',y')] K(a,y; a',y') da’ dy’ (2) 


contributes to the stimulus EZ which results in an increase of wy for 
individuals located at the point 2,y. The total stimulus E is given by 


ia J {i [X(x'y’) —Y¥(a'y')] K(a,y; xy’) dx’ dy’, (3) 


where the integration is extended over the whole area S occupied by 
the population. Therefore, instead of equation (8) of loc. cit., we 
now obtain 


wD oa ff xen) 


(é , , , , 4 

— Y¥(x',y')] K (x,y; x,y’) dx’ dy’ — ayp(x,y). am 
By the same argument as that used in loc. cit. we find that X (z’,y’) — 
Y(zv',y’) isa function of F(y) of p(x',y’) at the point x’,y’. Depending 
on the assumptions which we make about the distribution function 
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for ¢ and for &, we find F'(w) to be either of the form given in loc. 
cit., by H. G. Landau (1950), or H. D. Landahl (1950). In all cases, 
it is a function which is zero for y = 0, then for y > 0 increases 
monotonically to the asymptotic value No(2’,y’), where No(x’,y’) is 
the population density at x,y’. Hence equation (4) may be written 


dy (x,y) erie 55 Sa RO 
SHA I [Flv] Klay; 0’) de'dy — ap (ay). (5) 
S 

The solution of this nonlinear integrodifferential equation de- 
pends on the size and shape of the area S. We shall limit ourselves 
here to the very special case in which the area S isa circle of radius 
FR, and in which all quantities are functions only of the distance r 
from center. Furthermore, we shall restrict ourselves to the study 
of the steady states, in which dy/dt = 0. Equation (5) now becomes 


ap(r) =A f  FLy(r')] Kir’) dr’. (6) 


For the function K[(x — x’)? + (y — y’)?] we may choose any 
plausible function of 7,.., which decreases with distance, tending to 
zero When 742 — o, and remains always positive. To make our equa- 
tions soluble we shall put as an approximation 


FOR ee) ba ane 2 (yy | 5 (7) 


that is, use an inverted parabola. For sufficiently large distances K 
becomes negative, which is physically meaningless. If, however, we 
take 


(8) 


MS pe 
we insure the positivity of K for all pairs of points within the circle 
of radius R. The inverted parabola may be considered as an approxi- 
mation, for not too great a distance, to a curve which, for example, 
looks like a normal distribution curve, and approaches zero asymp- 
totically. 

Introducing polar coordinates r and 6, we have now for any 
function f(x,y’) 
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{J f(a'yy') K(a,y; x,y’) dx’ dy’ = {|} f (x',y’) [1 


8 Ss 


—plx—a')? + (y—y')?] dx’ dy’ (9) 
= f [ree [1—w(r2+ 12 —2rr’ cos(6 — 6')] 1 dr’ dé’. 


If f does not depend on 6, so that f(7’,0’) = f(7”), then expression 
(9) becomes, after integrating with respect to 6 


Qn cio (1—a(re +72) 4 ar’. (10) 


We thus see that in equation (6) K,(7,7’) = 2a[1 — w(7? + 7’?)) 7’. 
Putting 


a 
we may write equation (6) thus: 
R 
vO =A [Ply — at +0'2)1 7 ar. (12) 
0 


Since F(0) = 0, therefore y = 0 is a solution of this integral 
equation. We shall now look for other possible solutions. 


Equation (12) may also be written 
Rk R 
vir) =2(1— mre) f Elta dr’ — au | Fly(r')] 1? dr’. (18) 
0 0 
Both integrals are constants, independent of +. Putting 
R R 
L=) f F(p(r)) rar; M= ig f Fly(r)] dr, (14) 
0 0 


we find from (13) 
yp (r) = L—M — uLr’. (15) 


Since F(y) is a known function, therefore by introducing (15) 
into (14), we obtain two equations for the determination of the two 
constants L and M. If those equations have real roots, then (15) 
represents a solution of the integral equation (12). 
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To find the two equations, we introduce wy as the integration 
variable in (14). We have from (15) 


ad a! = 
r re pa (16) 
Introducing this into the first equation (14) we find 
y) L-M-LLR? 
af alent Fe F (wp) dy. (17) 


The second equation (14) together with equation (17) now gives 


Mere a L-M-pLR? 
= Li atc ) w ) 
aa |, | PFU) dy. (18) 

As has been said, the general properties of social imitation do 
not depend on the particular form of F' (yw), as long as it satisfies the 
conditions mentioned above. We shall choose here, as an illustration 


FR 0: (19) 
1+ yy 
where y is a constant. 

Since F'(y) is thus defined only for positive values of y, there- 
fore the integrals in (17) and (18) have a meaning only if both 
limits are positive. This implies, in particular, L — M> 0. A priori 
we cannot be sure of this since L and M are to be determined by 
equations (17) and (18). We shall, however, treat those equations 
on the assumption that L — M > 0, and L — M — LuR? > 0, and 
then show that this leads actually to positive values of those two ex- 
pressions, thus proving that such positive solutions exist. 

Introducing (19) into (17) and evaluating the integral, we find 


TEN die eed 7 (1 mE See )] (20) 
= Ne 0 = ; 
2 2uLy ie 1+7—M) 


In asimilar way we find from (18) 
Ake? 1— y(G—M) . AR*u 
2L y 4 


M=L—M+N,| 
(21) 


1 LR? 
+ 5 tog (1-2 = )] 
QuLey? Pep M) 


The above expressions simplify considerably if wu is so small that 
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we may expand the log terms and preserve only the lowest non- 
vanishing powers of uw. This is quite in line with condition (8). 
With linear terms in u only we find from (20) 


rks ANoR? y(L—M) 
~ 2 1+y(L—M)° 
If we do the same thing with (21), we fnd M =—0, M being a dif- 


ference of two finite terms. We must therefore preserve terms in .? 
also. This gives 


(22) 


_ AN.Rtp [1+ y(L—-M)P—1 


4 G+7h—mT | “ 
Subtracting equation (23) from (22) and introducing a new variable 
C=L—-M, (24) 
we find 
AN Re? 4 
Beer ecco |e 


If w is very small, so that the second term is negligible, equation (25) 
has the roots C = 0 and 


AN oR? 1 
Peng Saas (26) 
7 
which is positive if 
AyN .R? > 2. (27) 


Equations (22) and (23) show that for C > 0, L > 0, and for 
a fe small «4, L — M — uLR? > 0. If wis not negligible, but small, 
so that 


Ru<<2, (28) 


which almost follows from (8), equation (25) still has a positive 
root if (27) is satisfied, as is readily seen from graphing the right 
side of (25). If, however, uw becomes sufficiently large, so that the 
two terms of the right side of (25) are comparable, then equation 
(25) has only the root C= 0. Equations (22) and (23) then give 
L=M=0, and from (15) it follows that y=0. 

We thus see that with a proper choice of constants, the integral 
equation (12) has a positive solution of the form (15). The adopted 
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behavior, in this case R,, is exhibited by more individuals in the 
center of the area than at the periphery. 

This may have definite sociological implications, in regard to 
behavior patterns characteristic for “the heart of a country.” The 
physical reason for such a distribution of behavior is that even for 
uniform population density, an individual has more neighbors in the 
center than at the periphery. At the boundary he is surrounded by 
neighbors only on one side. Differences in behavior patterns in the 
heart of a country and at its borders are frequently due to the effects 
of adjacent countries with different behavior patterns. But the pos- 
sible existence of the intrinsic effect discussed in this paper must 
be kept in mind. 

It would be of interest to study areas which have the shape of 
either rectangles or ellipses, and see what effect the increase of the 
“specific boundary” would have. Better approximations than the in- 
verted parabola should also be sought. If the distance effect on imi- 
tation is rather pronounced (rather large uw), then only neighbors in 
a limited region will have an effect. We still could use an inverted 
parabola, but extend the integration with respect to w’,y’ in equation 
(5) only to a circle of radius 1/\/u around the point «,y. Inside of 
that radius 1 — ur? > 0. Outside it is negative and we may inter- 
pret this as meaning that there is no influence from outside that 
circle. In this case the integral in (5) becomes a function of (x,y) 
not only by virtue of y containing those two variables, but also be- 
cause x and y enter into the limits of integration. In such a case the 
results may possibly not depend too much on the size of the area S 
and a possibility of several maxima or minima for w should be inves- 
tigated. 

The restriction to sufficiently small values of « seems to be con- 
nected with the circumstance that K(7r..) becomes negative for suf- 
ficiently large values of 7:2. This is evidenced by the following con- 
sideration. 

Put: : 

il RAL = want 6 ee (29) 


Introducing this into equation (6) and putting 


Q=4 f Fol en r dr, (30) 


we find 
p(r) =Qen", (31) 
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From (31) we have 


d 
deen (32) 


Remembering that for R= 0, y=—Q, and for r=R, y= yo, where 
wy. is a very small quantity when RF and wu are sufficiently large, we 


find from (30): 
yi Q 
= il F (y) dy. (33) 
2u ” 


If F(w) is of the general shape specified above, then its integral 
curve is symmetric with respect to y = 0, has a zero tangent at 
y = 0, is everywhere positive, and has two asymptotes, symmetric 
with respect to the line y = 0. Hence for a proper choice of 4/u, re- 
gardless of the value of u, equation (33) has two symmetric roots, 
one positive, the other negative. 

It is, however, difficult to say to what expression K(1.2) corre- 
sponds the expression (29) for K,(%,7’). 

Essentially similar methods may be applied for the case in which 
K(a,y; x',y’) is not symmetric. This is likely to occur in many so- 
ciological situations. For example, an individual living in the city 
is less likely to imitate one living in the country than vice versa. The 
absolute geographical coordinates of the individual may thus affect 
his tendency to imitate or his property to be imitated. A convenient 
kernel K(7,r') in this case may be 


e-k(r-r’) 


which will permit the use of the same method as has been used here. 

We may generalize further the problem by considering a pre- 
scribed distribution of density N,. 

Similar types of problems arise when we consider the effect of 
social distance on imitation. Nonsymmetrical functions of the type 
of (29) will be the rule here. The problem is somewhat simplified 
by being inherently one-dimensional. 
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CONTRIBUTION TO THE PROBABILISTIC THEORY OF 
NEURAL NETS: II. 


FACILITATION AND THRESHOLD PHENOMENA 


ANATOL RAPOPORT 
COMMITTEE ON MATHEMATICAL BIOLOGY 
THE UNIVERSITY OF CHICAGO 


The output curve of a single neuron with a threshold of response 
with respect to the frequency of the stimuli is derived. If the stimuli 
are regularly spaced in time, the output curve has discontinuities. If 
the threshold and/or refractory period are sufficiently large, the output 
curve approaches the “all-or-none” curve. 

In the case of completely randomized stimuli, the output curve is 
sigmoid. The equation of this curve is derived and some properties are 
studied. Threshold and “all-or-none” effects can be achieved by “pyr- 
amiding” neurons of this type to converge on neurons of higher order. 


In our discussion of the filter net (Rapoport, 1950a), we postu- 
lated a neuron possessing a threshold of response with respect to the 
frequency of incoming stimuli. The concept of threshold is funda- 
mental in the theory of the propagation of nervous impulses. How- 
ever, this concept varies in the several approaches to neural physi- 
ology. In all such discussions an “intensity” of the incoming stimu- 
lus plays a part. However, the question “intensity of what?” is 
answered differently by different authors and sometimes is not an- 
swered at all. Indeed, for the purposes of a quantitative theory it 
is often of advantage to omit all reference to the physical nature of 
the quantities considered, as has been done by Rashevsky, Hill, and 
others. 

To be sure, in some models of the nervous system the specific 
character of “intensity” is to some extent determined by the model 
itself. Thus in the model of W. S. McCulloch and W. Pitts (1948) 
the “intensity” of a stimulus impinging on a cell body is implicitly 
defined as the number of end bulbs terminating upon the cell body 
which fire simultaneously. Thus in this model the threshold becomes 
simply the number of such bulbs which, when firing simultaneously, 
cause the neuron on which they impinge to fire. Here threshold is 
dimensionless. On the other hand, in our treatment (Rapoport, 
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1950a), intensity of input for a given neuron is defined as the fre- 
quency of stimuli received. Here threshold is a frequency and has 
dimensions [7']-. 

The so called “all-or-none” law has likewise different interpre- 
tations in different models of the nervous system. 

In the McCulloch-Pitts’ picture the concept of “intensity of fir- 
ing” is not applicable to a single neuron. The neuron either fires or 
does not fire, so that the all-or-none law applies to all neurons in 
that model. In our model the situation is different. Intensity (fre- 
quency) does have a meaning in connection with a single neuron. 
An all-or-none neuron (or aggregate), according to the frequency 
concept of intensity, would have to be one which responds with the 
same frequency (or not at all) to all stimulus frequencies. Such be- 
havior is not an inherent characteristic of the model. If we wish to 
postulate neurons or aggregates which do exhibit an “all-or-none” 
behavior in this sense, we must derive conditions for this to be the 
case. In our construction of neural nets which simulate certain as- 
pects of neurophysiological and psychological behavior we shall pos- 
tulate the existence of such aggregates. Therefore, our problem is to 
derive the structure and parameters of such aggregates. 


Regularly Spaced Stimuli. 


The input-output curve of an “all-or-none” neuron is given by 
the following discontinuous function 
f (¢) = Oi ormieres 7, 
(1) 


f(x) =constant>0, for x#>x". 


The graph is shown in Figure 1. 


OUTPUT 


INPUT 


X=h 
Figure 1. All-or-none output curve with threshold h. 


ANATOL RAPOPORT 189 


If a neuron receives regularly spaced stimuli and has a certain 
threshold of response, its output curve will obviously contain a dis- 
continuity at the threshold frequency of the input. In fact, Jet 6 be 
the refractory period of the neuron and h the threshold. Then, if x 
is the input frequency, we have for the output 


ite )—0% for, 2- hs 
Teenie fox))2) “for: (4.2 hk; 


where [6x] is the greatest integer less than dx. The function (2) is 
discontinuous not only at « = h, but also at all the successive values 
of x for which 6x is an integer, i.e., for x = n/d where n takes the 
integral values, m, %) + 1, etc., and where n, is defined by the in- 
equality 


(2) 


Mo —1LZEhs<n. (3) 


The discontinuities of f(x) are thus equally spaced. The slope of 
f(x) is likewise a discontinuous function 


f(x) = (h + [6x])*, “2h, (4) 


where [6x] = n,m + 1, etc. 


FIGuRE 2. Input-output curve of a single neuron with a threshold. 


The graph of (2) is shown in Figure 2. The approximation to 
the output of an all-or-none neuron (Figure 1) will be the closer, 
the smaller the slope f’(x) immediately to the right of the first dis- 
continuity. But this is 


Giana Atl é) |: (5) 
Thus the quantity (5) can be said to measure the deviation from the 
all-or-none law. 


Theorem 1. If a neuron receives regularly spaced stimuli of 
which h must impinge per unit time to fire the neuron, the deviation 
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from the all-or-none law is given by [h(1 + 6)]"; where 6 is the re- 
fractory period of the neuron. 


Corollary. The all-or-none law is approximated the more closely 
the greater the threshold and the refractory period of the neuron. 


Randomized Stimuli. 


The foregoing case is probably of very limited physiological im- 
portance. If it is supposed that the incoming stimuli are the firings 
of a single neuron, then the threshold concept becomes dependent on 
“temporal summation,” a hypothesis largely discredited among neu- 
rophysiologists. Indeed, it is impossible to account for temporal sum- 
mation if the refractory periods exceed the periods of facilitation (or 
latent addition), because one would need to suppose that the neuron, 
whose firings are “summed,” fires with a frequency greater than that 
allowed by its refractory period. 

A more plausible assumption is that the stimuli come from sev- 
eral different neurons. Facilitation then depends on “spatial sum- 
mation,” a hypothesis generally accepted in neurophysiology. But 
then the difficulty arises in supposing the stimuli equally spaced. In 
the paper mentioned above of McCulloch and Pitts and in the subse- 
quent work of N. Rashevsky (1946) and J. B. Roberts (1948) this 
assumption is fundamental. However, we consider the clock-work 
precision of synchronized firing a rather artificial hypothesis and will 
abandon it, even though it accounts very simply for frequency thresh- 
olds. We shall suppose instead, as in our previous paper, a complete 
randomization of incident stimuli and compute the input-output curve 
for a neuron with a facilitation mechanism (i.e., spatial summation 
of stimuli falling within a period of latent addition). We shall fur- 
ther inquire into the conditions under which the input-output curve 
may approach that of an all-or-none neuron or at least exhibit a 
threshold effect. 

Consider a neuron receiving a shower of completely randomized 
stimuli of average frequency x per second from the outside and re- 
sponding only to the hth stimulus of a group which falls within the 
interval o , provided none of the h stimuli impinges during the re- 
fractory period. 

Take the origin of time at 6 seconds after the neuron has fired. 
Consider now the set of points on the time axis at which stimuli are 
received. Somewhere on the time axis there will be a first stimulus 
of a group of h stimuli which falls within c. Call this first stimulus 
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s, and let us compute the probability distribution of the time of oc- 
currence of s. 

Since the incident stimuli are completely randomized, the prob- 
ability of h — 1 additional stimuli occurring within o seconds after s 
is independent of time, i.e., is a function of x and o only. Call this 
function g(a#,c). Then the probability of s falling in an arbitrarily 
chosen infinitesimal time interval dt will be 


Pat=| (pe ezcuxca (adt) 9 (2,0). (6) 


The right side of (6) is a product of three mutually independent 
probabilities, namely, in the order of factors: 1) that there has been 
no s in the interval (0, t) ; 2) that a stimulus is impinging in dt; and 
3) that h — 1 additional stimuli are impinging within o seconds 
after s. 

Dividing both sides by dt and differentiating with respect to 7, 
we obtain the differential equation 


P'(t) =— xg (2,0) P(t), (7) 
which upon integration yields 


where A is the constant of integration. 
Normalization of P(t) demands that 


A=uxg(x,c). (9) 


It remains to determine g(a,c). 
Note that 


g (2c) aa igs! (x,7) dz (10) 


where PP) (x,t) is the probability distribution of the time of the 
(h — 1)th event of a Poisson-distributed sequence of average fre- 
quency x. The recurrence formula for these distributions is given 
by (Jost, 1947) 


P®(t) = [Per e—nPod:, (11) 


where P(r) is the distribution of the first event. In the case of the 
Poisson distribution, this leads to 


PPD (t) = (Bik al? (12) 


gh 
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i (Ean a (13) 
g(x,0) = ee, 7% Os On. 
; (hee 


We define the polynomials E,(z) as partial expansions of exp 
{z} , broken off at the kth power of the argument. Thus 


k 
Ex(z) =X 2/7!. (14) 


j=0 
Integrating the right side of (13) by parts and simplifying gives 
g(x.) = 1— E,2(a%e) e*?. (15) 
But equation (8) shows that the distribution of the instant s is a 
Poisson distribution with average frequency xg(#,c). Therefore the 
expected time of occurrence of s , counted from any instant 6 or more 
seconds following a firing instant, is given by 
{e[1 —Ey (esles*]}~ (16) 


Our next step is to compute the expected time of occurrence of 
the firing, following the occurrence of s. This is clearly given by 
averaging the time of occurrence of the (k—1)th stimulus follow- 
ing s over the interval o. This average is given by 


iE ght ph d 

i. ae Ce 

és (h—2)! 

i [ gh2 zh-2 da, 
SSSA 
ey 


ct e*°[ (xo)** + (h—1) (ao)** +---- (hR—1)!] — (hR—1)! 
xe*?[ (ao)? + (h—2) (ao)*3 + ---- (R—2) 1] — (h—2)! 
_ (h=1) Te" Ey (20) 
x[1— e*? Ey. (xa) | 
For very large o or for very large x, the expected time of the 
(h—1) th incidence, measured from any incident stimulus, falls with- 
in o and is approximately equal to (h—1) /a, as is seen from the last 
expression of equation (17). 
We can now write down the total expected time é of the next 


(17) 
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firing from an instant of firing of our neuron. This will be seen to 
be the sum of 1) the refractory period; 2) the expected time of s; 
and 3) the expected time of the (h—1) th stimulus following s . Hence 


" oe (nL) UL e228 (0): 

Cpe" Pye re) | 
Therefore, the average frequency, given by the reciprocal of ¢ will be 
F(x, 6 ,0;h) 


F=36 (18) 


x [1 — ee? Ey. (ae) ] (19) 


~ 8x[1—e? Ey2(x0)] +14 (h—1) [1 —e** Fya(ao)] 


The Properties of the Facilitation Input-Output Curve. 


We wish to examine the behavior of the function (19) for small 
and large values of x, where the other parameters are held fixed. 
To do this, we first examine the expression [1 — e°? Ey-2.(a0)]. This 
can be written as 


(os) k 
é"[e" — Ho (to) =e" > iz) 
k=(h-1) k! 


(eh (—ao)* © (x0)* (20) 


k=0 k ! k=h-1 k ! 


This product of two infinite series is seen to be itself an in- 
finite series whose lowest power term is (ac)"1/(h—1)!. But then 
the numerator of the right side of (19) is also a series, whose low- 
est power term is x"o1/(h—1)! and this is the dominant term for 
small values of x. On the other hand, the denominator of the right 
side of (19) approaches unity as x becomes vanishingly small. We 
thus have 


Theorem 2. The facilitation input-output curve for randomized 
stimuli behaves like x®o/(h—1)! for small values of x. 

For very large x, the first term of the denominator of (19) be- 
comes dominant so that the other two may be neglected. But then, 
canceling the brackets in the numerator and denominator, we obtain 
1/6 for the average frequency, which should be the case. 

Let us now consider the behavior of the function (19) with re- 
spect to some limiting values of h. If we define H.(z) = 0, which 
is consistent with the general definition of E;,(z), since E, may be 
defined by the recurrence relation 
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dE, (2) 


7 = Eya(Z) ; Ey, (0) =1, (21) 
then it is easily seen that for h = 1, the expression (19) reduces to 
f(z,0) =2 @2-- 1). (22) 


This is the output of a single neuron responding to every stimulus of 
a Poisson shower, as previously shown (Rapoport, 1950a). On the 
other hand, if h is large, then, since, 


Tee 
Lim [Ex (z) — Ei (z)] = Lim |_=0, (23) 
k->00 k->00 k! 


we may write with good approximation for (19) 


xg (xo) 


(64 +h—1)q(ac) +1’ Nor 


f(x ’ ty) ’ a) a 
where q(ac) = 1 — e*? E,.(ac). Formula (24) indicates how the 
simple output curve (22) is modified by the introduction of facilita- 
tion together with a large threshold. 

The graph of (19) is shown in Figure 3. It is a sigmoid curve 


f (x) 


x 


FIGURE 8. Output of a neuron with facilitation, period receiving stimuli 
randomized in time. 


depending on the parameters 6, ¢, and h, of which only two are 
essential, since the choice of time units may make either 6 or « equal 
to one. To fix ideas, let = 1. The question arises whether the para- 
meters o and h can be so chosen that the curve approaches the all- 
or-none curve, or at least exhibits a threshold effect as in Figure 2. 
That is to say, can o and h be so chosen that the curve starts out 
with negligible slope for a finite range of the input and suddenly 
rises to almost its maximum value, ie., the slope in the neighbor- 
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hood of the inflexion point becomes very large, while it is small every- 
where else? This question may be of importance in the mathematical 
theory of evolving organisms in the following way. Suppose that an 
organism has mechanisms which depend on neural aggregates re- 
sponding according to an all-or-none law or having a frequency 
threshold of response. If it could be shown that such mechanisms 
can be constructed of neurons such as described in this paper, one 
could make a hypothesis that such neurons arose through natural 
selection, where those individuals were successively selected whose 
parameters o and h approached gradually their “optimum” values. 

Our conjecture is that such is not the case, because there is a 
finite upper bound for the values of df(x,6,0,h)/dx whatever be 
the values of o and h. The justification of the conjecture will be in- 
vestigated elsewhere. We will, however, present a different mecha- 
nism based on neurons of the type described here which does exhibit 
threshold and all-or-none effects. 


Threshold Effect by Pyramiding. 


Consider a net as shown in Figure 4 consisting of an aggregate 
of m neurons N,, each of which has an output curve as in Figure 3 
and all of which converge on N., a neuron of higher order. For sim- 


plicity all the parameters 6, «, and h are taken to be the same for all 
neurons. If WN is large, the firings of the neurons N, will be sufficient- 


Vy 


FIGURE 4 


ly randomized so that the same mathematical treatment is applicable 
to the output curve of N. as has been given above, except that the 
input will be the combined total output of the neurons N,. We will 
then have for the output of Nz 


F(x) =fl[nf(x)]. (25) 
The above discussion of f(a) has shown that it is a sigmoid 


curve with zero initial slope and a constant limiting value. To see 
how such a curve is modified by the transformation (25), we must 
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compare the effects of the inputs x and nf (a). Figure 5 shows a si- 
multaneous graph of the functions y = « and y = nf (x). Note that 
f(x) is a monotone increasing function. Thus there always exists a 


x 
Figure 5. A comparison of the input from the outside, y = x with the in- 
put from n neurons N,, y = nf (x). 


value « = x*, such that nf(x) < «a for « < x* and nf(x) > x for 
x > «*. If we simultaneously plot f(x) and F(x), we will also have 
F(a). < f(@) for: «.< 2" and. F(a) > 77 (2) torn asso woe 
Figure 6. 


Xf 


: xX 
FIGURE 6. A comparison of the output of the neurons N,, y = f(x) with 
the output of the neuron N,, y = F(x) = f[nf(a)]. 


Since nm can be taken arbitrarily large, the effectiveness of de- 
pressing and raising F(x) on both sides of x* can be made arbi- 
trarily large. It is true, however, that x* is a function of n » in fact, 
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is vanishingly small when n is very large. This spoils the main fea- 
ture of the threshold effect, since it depresses the threshold to zero. 
However, this situation can be remedied by increasing h. A large h 
flattens f(a, h) and, therefore, nf(a,h) near the origin. Therefore 
a large h has an effect opposite to that of large n. Hence we see that 
by choosing n and A sufficiently large, we can approach the all-or- 
none effect with respect to frequency of stimuli by a pyramiding ar- 
rangement. 

This investigation is part of the work done under Contract No. 
AF19 (122)-161 between the U. S. Air Force Cambridge Research 
Laboratories and The University of Chicago. 
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ON A MATHEMATICAL THEORY OF THE REACTION OF 
CELLS TO X-RAY IRRADIATION 


EDMUND PINNEY 
UNIVERSITY OF CALIFORNIA 


To describe the variation of concentration of certain substances in 
cells subjected to X-rays for a finite length of time, R. M. Sievert 
(1941) has given an equation of mixed differences involving time-lag 
and having discontinuous coefficients. In the present paper Laplace 
transform methods are applied to solve this equation in terms of a se- 
ries valid for all values of time. In particular, the conditions given by 
J. Th. van der Werff (1948) that the concentration approaches a con- 
stant value are confirmed; otherwise steady oscillations in concentra- 
tion may be expected, or oscillations appear whose amplitude steadily 
increases until the cell is destroyed or the assumed mathematical model 
otherwise fails to apply. 


R. M. Sievert (1941) has studied the concentration of certain 
substances on cells under the action of X-ray irradiation. Assum- 
ing the reaction of the cell to deviations from the equilibrium con- 
centration to be reversible and to occur with a time-lag -, Sievert 
obtained 


dx (t) 

dt 
where x(t) measures the concentration of the substance, x is an 
equilibrium value of «(t), J is an irradiation constant depending 
upon the incoming radiation, and RF is a constant measuring the 


reaction of the cell to a departure from the equilibrium concentra- 
tion. The quantities a, and f; are defined by 


o,—l1, Ox ii els, 


=—Ta,x(t) + RB: [a —x(t—7)], (1) 


2 

=0, taal Ee 2) 
b=0, 0<t<7r, 

3 

== ibe teats ee 


T being the irradiation time. The irradiation constant I is positive 
or negative according to whether the effect of the radiation is to 
decrease or increase the concentration of the substance. 
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Sievert has given graphical solutions for some particular val- 
ues of the parameters. J. Th. van der Werff (1948) has applied the 
method of generating functions to Sievert’s equations and determined 
the conditions under which the solutions asymptotically approach 
constant values. He has shown that, ast 7 wo, 


a(t) >a when0<T<oandRr<a2/2, 
a(t) > aR/(R +1) whenT= oo andk <I, (4) 
a(t) > mR/(R+1) whhnT=—o,R>I1,Rr<¢/sin¢g, 


where ¢ is the smallest positive root of cos ¢ + I/R=0. 

Other types of asymptotic behavior (oscillations, for instance) 
might be physically conceivable. More powerful methods are now 
available, so the asymptotic behavior may be determined for all 
ranges of the parameters. A solution for x(t), valid for all positive 
t, will be given in this paper, and this will, incidentally, give the 
asymptotic behavior desired. 

It follows from two theorems of E. M. Wright (1948, Theorems 
1 and 8) that a unique differentiable solution x(t) of (1) exists 
which, together with its derivative, is exponentially bounded as 
t > ow, that is, real constants K , s. exist such that as t > o 


|e (t) | <Keente sia(t) them (5) 


Therefore the Laplace transform of x(t) and 2'(t) exists for 
Re(s) > s). We denote the Laplace transform by 


TY = {| eat) din (Reaves (6) 


Integrating by parts, taking the initial value of x(t) as the equilib- 
rium value x, there results 


x (6) [vee eat, jae Ss ae (7) 


Multiply equation (1) by e-* and integrate with respect to ¢ 
from 0 to o. Using (2), (3) and (7), we obtain 


T 


sX(s) —2%=Rzx, f e*'dt —x, ITF (sT) —R [oevat—2) dt , 


where 


F(sT) == fi ctewat. (8) 
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Replacing ¢ by t + 7 in the last integral of the above expression and 
using (6) we have 


(s+ Re") X(s) =2; 1+ “ e** — JTF (sT) | : (9) 


Using the inverse Laplace transformation (Doetsch, 1943; Satz 
6[6.5], p. 107), we find 
p+t 


i X (s) e*'ds (10) 


p-ico 


1 
t) =— 
oA 2ni 


where p > s). That is, all the poles of X(s) lie to the left of the line 
Re(s)=p. 

In calculating x(t) from (9) and (10), it is necessary to know 
something about the roots of the transcendental equation 


ce? +r=0 (11) 


in the case r > 0. This equation has been studied by a number of 
authors, the first probably being L. Euler (1777). The work of F. 
Schiirer (1912, p. 175) is one of the most complete treatments of this 
type. When 0 < r < 1/e, (11) has two distinct negative real roots 
which we will denote by o:(7) and oo-(r), where oo-(7) <oo+.(7) <0. 
As r increases to 1/e , these roots coalesce to a double root, oo. (1/e) 
=-—1. As, increases above 1/e , these real roots disappear altogeth- 
er; in their place a pair of conjugate complex roots appears. This 
pair will also be denoted by op: (7), the plus sign pertaining to the 
root with positive imaginary part, and the minus sign pertaining to 
the root with negative imaginary part. 

In addition, (11) has infinitely many other pairs of conjugate 
complex roots. Schiirer has shown that these roots all exceed the 
roots o.(r) in absolute magnitude and in fact may be ordered. 


o1-(7), oo:(7), «++» in such a way that 
|oor(7) | < Joo-(7) | < fous (1) | < |ore(7)| <-+++. (12) 
Setting 
o=at ip (18) 


in (11) and equating real and imaginary parts to zero, we have 
e*(acosB—fsinf) +r=0, fcosf+asinf=0. 


Solving the second equation for a and inserting this in the first we 
obtain 
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a——/f/tan Bf, (14) 


f(8) =8/tan 6 — In(8/sin 8) ——Inr. (15) 
The quantity 6 may be determined from (15) and a from (14). 


FIGURE 1 


The real roots may be determined graphically from Figure 1 
and equation (11). The complex roots may also be determined 
graphically. The function f(6) has been tabulated by R. Frisch and 
H. Holme (1935) and is shown graphically in Figure 2. From this 


graph and (15), the successive values f)(r), $:(7), fo(”), --:> of 
6 can be determined for each r. Then, using (14), the correspond- 
ing values ao(7), a,(7), (7), ---- of a can be determined from 


8 
ie 2 | 
6 
elas 
iad 


aie 
ty 
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an(7) =— B,(r) cot B,(7). (16) 
The complex roots of (11) are then 
Ons (1) =On(r) + 16, (7). Cla) 


From equation (15) we see that #, is a first or second quad- 
rant angle. As n > w, £, > oo for otherwise the roots of (15) 
would have a point of condensation, implying f($) = 0 in its region 
of analyticity about this point, which is clearly not the case. As 
6 > o, the first term of f(8) in (15) dominates, and, in fact, ap- 
proaches infinity unless 8 > (2n + 1/2)a, ie, By = (2n+1/2)a—e,, 
where «, > 0 as n > wo. Then sin £8, = 1 + 0(e2), cot Bp = en + 
O(en?), so (15) gives (2n + 1/2)ae, — In[(2n + 1/2)a/r] = 
O(nen?) + O(en?2) + O(e,/n). Therefore e, = 6, + 0(e,?) + 0(en2/n) 
+ 0(é,/n?) = bn + 0(6,?), where 


On = [(2n + 1/2) n]In[ (2n + 1/2)a/r] . 
Using (14) and (15) we find 


=— In[6,/(r sin £,) ] =— In{sec «,[ (2n + 1/2) a — & J] /r} 
—=—In[ (2ni+ 1/2) 2/r] — &,2/2 + &,/[ (2n + 1/2)2] + O(e,*). 


On 


Therefore, for sufficiently large n we find 


On (7) =— In[(2n + 1/2)a/r] —6,2/2 
a Ons Lieto 1 /2)al+ 0(6;4); 
BAT) hata Lut — var 0 (0,,°), 
Oe Aee yey in) (20 1/2) 2/7] 
Transposing r to the right side of (11), dividing by o, taking 
the absolute value, and applying (12) and (13) we obtain 


(18) 


Jato, (7) | > | ao) | > Jar(r)| > aa(r) > oe. (19) 


Consider the special case in which 7 = o. Equations (6) and 
(8) imply that TF (sT) > X(s)/a% as T > o. Using (9), 


GCE St St 


— ———_—___—_.. 20 
ss+i+kKe* ) 


X(s) = 
This has poles at s = 0 and at s =—I + (1/r)on.(Rre) forn=0, 
1,2,----. The function X(s) is bounded on the circle at infinity. 
Therefore the contour in (10) may be completed by adding the left- 
hand semi-circle at infinity when t > 0. Then applying Cauchy’s 
theorem, for Rre’” # 1/e, we find 
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co | exp[—It + ton: (Rre™) /r] 


re 
w(t) 7 et tilt 2 i (reel be ee 
(21) 


exp[—It + ton- (Rre!7) /7] 
(Ir = o,.(Rre!) | [Ll + on Cre") 4 : 


From (17) and (18) we see that these series are uniformly con- 
vergent in ¢t for t > 0, and all but at most a finite number of terms 
of the series tend exponentially to zero at t > o. It is not difficult 
to show that the second and third conditions of (4) are precisely 
the conditions that all the exponents in (21) have negative real parts, 
so the last two results of (4) are verified. If R > I, Rr = ¢/sin ¢, 
where ¢ is the smallest positive root of cos ¢ + I/R = 0, x(t) oscil- 
lates with a frequency ¢/7 and an amplitude which may be obtained 
by setting oo.(Rre) = — Ir = id in the n = 0 terms of (21). Fi- 
nally, for Rr > ¢/sin ¢, x(t) undergoes a negatively damped oscil- 
lation which increases in amplitude until the regime of equation (1) 
ceases to apply. 

Now consider the case in which T < 
tion (21) applies in the interval 0 < t < 
and (21) we obtain 


oo. In this case the solu- 
T. Therefore, using (8) 


al 
F — — e-5 
Th, nae 


+ Ir {1—exp[— 8 —IT + om, (Ree) T/7]} 


x {[s + IT — om. (Rre') T/r] (It — om. (Rre!) ] (22) 
X [1 + Gms (Rre™).] }> 
+ Ir 3 (1—exp[—8 —IT + om-(Rre")T/r]} 


x {[s + IT —om-(Rre!) T/7] [It — om-(Rre!) ] 
PT er ae (Rre!™) | ire 


In the particular case when 7 < 7, a simpler expression may be 
obtained. For 0 < ¢t < 7, using (1), (2) and (3), we have 


x(t) = aye . (23) 
Using (8) we also have 
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if 
F iS) i= — o-(s+IT) 
(s) sag lt € Ife (24) 
The expressions on the right-hand sides of (22) and (24) are 
without singularities except on the right-hand semi-circle at infinity. 
If either expression is substituted in (10) and the contour of (10) 
is completed, as it may be for t > T — z by adding the left-hand semi- 
circle at infinity, Cauchy’s theorem may be applied, giving 
ea F n+ 
1 + on (Rr) 


n=0 


exp igeathea) /-| 
Se ell foee( es) |} ; oe 


where F’(s) is obtained from either (22) or (24). 

From (25) we see that «(t) — 2, provided all the exponents 
on the right-hand side have negative real parts. Because of equation 
(19) this will be true if a.(Rr) < 0. For Rr < 1/e, o.(Rr) are 
negative real roots, so this is true. For Rr > 1/e, oo:(Rr) are con- 
jugate complex. From (16), we have a.(Rr) < 0 if £.(Rr) lies in 
the first quadrant. From (15) and Figure 2 we see that the condi- 
tion for this is that Rr < 2/2. The first part of (4) is thus verified. 

When Rr = 2/2, Bp = a/2, x(t) oscillates with a frequency 
a/(27) = R and an amplitude which may be obtained by setting 
oo: (Rr) = = mi/2 in the n=—0 terms of (25). Finally, for Rr > 2/2, 
a(t) undergoes a negatively damped oscillation which increases in 
amplitude until the regime of equation (1) ceases to apply. 

Equation (1) can be expected to be only a more-or-less rough 
approximation to the actual behavior of a cell under X-ray irradia- 
tion. A better approximation might involve non-linear terms. Gen- 
erally the effect of non-linearities is to spread out the region of un- 
damped oscillations. Thus one might expect undamped oscillations 
in an interval about Rs = ¢/sin ¢ in the case T= oo, and in an inter- 
val about Rr = 2/2 in the case T < ow. 

This work was done in connection with the Office of Naval Re- 
search contract N6-ONR 251 TO 2 at Stanford University. 
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The effect of the introduction of either a constant or a variable 
threshold into Rashevsky’s theory of imitative behavior is considered. 
It is found that in either case it is possible to have five equilibrium 
configurations, three of which may be stable. 


A theory of imitative behavior has been developed by N. Ra- 
shevsky (1949) in which two mutually exclusive activities are con- 
sidered. The following discussion presupposes the reader’s familiar- 
ity with that paper. The tendency to perform either activity R, or 
R. is considered to be determined by the observed performance of 
others. This tendency, together with the inherent individual bias and 
random fluctuations in the discrimination mechanism (Landahl, 
1938), determines the final response. Let p(A) be the distribution 
of fluctuations, which is approximately a normal error curve, and 
let N(¢) be the distribution of biases in favor of R,. If py is a 
measure of the tendency to prefer R, to R, owing to imitation, and 
the intensity of excitation is proportional to the difference between 
the numbers X and Y exhibiting R, and R, respectively, then yp is 
given by [Rashevsky, 1949, equation (8) ] 


= A(X—Y¥) —ay. (1) 


Constant Threshold. Using the same arguments as those of N. 
Rashevsky (1949), but now introducing a threshold h (Landahl, 
1938), we find for the number X of individuals exhibiting behavior 


R, , the expression 
x= fovw f p(A)dAd¢ . (2) 
-00 -y-g+h 
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The number Y which exhibits behavior R, is given by a similar ex- 
pression, except that the limits of integration with respect to A are 
—co and (—y— ¢ — h).The number showing neither R, nor R, then 
is N. — X — Y, where N, is the total number. 


Using the notation 


ONE (y) = ieo| ire p(Aydd— f° p(ayaa | dé, (3) 


-y-pah = 
we find that equation (1) may be written as 
dy 


FE PANol (v) — ay. (4) 


We shall now evaluate the function J(w) for the case in which 
both p (A) and N(¢) are normal distribution functions, but only 
(A) is necessarily symmetric with respect to d= 0. 


Let 
1 
a) == —¢7/2 
10) =e (5) 
and 
G(x) = { ‘g (Ode. (6) 


If « = ¢/c, o being the standard deviation and oo = 0X, being 
the mean value of the function N(¢), then N(¢6)d¢ = Nog (a — x) da. 
Thus if s is the standard deviation of (A), equation (3) becomes 


2I(y) = [a@—m) 


ec. er ee (7) 
x —¢(-#-—**) —6(——**)] dx. 
8 s 
We now introduce the notation 
B=y/s, a=o/s, H=h/s. (8) 


If I(6s) [=I(y)] is differentiated with respect to 6 we obtain 
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al(ps) 1 fe ered sl 
ap Tee Vie 


1 THEE [ima gh 
nc ge rere || eee : 
One Wise 
Integrating this with respect to 6 from zero to § we obtain (cf. Lan- 
dau, 1950) 


(9) 


Boo en 


VA 2 Ny ee 


OX, + H ax, — H 
atl EF ere se a Od renee 
V1+e Vy to 


At equilibrium, dy/dt = 0 and, therefore, for equilibrium val- 
ues we of y, equation (1) reduces to the following expression: 


I (pe) = ay./2AN). (11) 


IfI(y) > ap/2AN,, then, from equation (1), dy/dt > 0, and, hence, 
wy increases. Conversely, y decreases if the inequality sign is reversed. 
Thus we may find out whether or not a point of equilibrium is stable 
by showing that a small displacement away from a particular value 
we results in a spontaneous return to that value (cf. Rashevsky, 
1949). 

We shall consider only the case in which « = 0. If we intro- 
duce the notation 


(10) 


A 
iY) 


2 
y= Weeage 


FIGURE 1 
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== p/\/S*= 0°, hj sr oe (12) 
and set x) = 0, we obtain from equations (10) and (11) 


I (V/s? + o ye) =4G(ye + A) 
Sheen (13) 
Perec oleae 

sib Sete SNOT es 
The expression $3G(y + H) + 4G(y — H) may be considered 
to be a function of y with H as a parameter. Figure 1 shows various 
members from this family of curves. The right-hand side of equation 
(18) is a straight line. Depending upon its slope and the value of H, 
there may be one of the following ways in which the straight line 

intersects any particular one of the family of curves: 


a) one intersection at the origin if H is small but 
Vs? + o? a/2AN, is large; 


b) three intersections if either H or \/s? + o? a/2AN, is suf- 
ficiently small; 


c) five intersections if HW is sufficiently large and if 


V/s? + 6 a/2AN,y 


lies within certain limits. 

The slope of J at y = 0 is equal to g(H). The maximum value 
m of the slope of the line, tangent to the curve I and passing through 
the origin, can be expressed, approximately, for H = 1 by the em- 
pirical expression 


m=1/(3.8 + 2H) + 0.515e7# . (14) 


It is clear that if m is greater than the slope of J at y = 0, which 
occurs if, and only if, H > 1, it is possible for a straight line through 
the origin to intersect the curve I twice to the right of the origin. 
With the approximation (14) we may summarize the results as fol- 
lows: 


a) if H < 1 and.V/s? + o a/2AN, 2 go (@), or if: > land 
Vs? + 0? a/2AN, 2 m > g(B), there is but one point of equilibrium. 
It is stable at the origin (cf. solid curves in Figure 1) ; 


b) if V/s? + o? a/2AN, < g(B), there is an unstable equilib- 
rium at the origin and two stable equilibria at + y., which can be 
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calculated from equation (18) (cf. broken curves in Figure 1); 


c) if H >1landif m > Vs? + o a/2AN, > g(B), there isa 
stable equilibrium at the origin, two unstable equilibria at + y.* and 
two stable equilibria at + y."*, y."* > y-*. Both y.* and y.** satisfy 
equation (13) (cf. dotted curves in Figure 1). 

If % 2 0, it is clear that the equilibrium at the origin will be 
displaced to an extent depending on the value of x, and the other 
possible roots of equation (13) will no longer be symmetrically placed 
about the origin. 

If the distribution of A is approximated by an absolute value 
exponential function, results similar to those above are readily ob- 
tained. 


Variable Threshold, No Bias. We consider next the case in 
which the threshold h varies from individual to individual and is 
distributed for the whole population according to some distribution 
function N.q(h). For this case we neglect any individual bias and 
thus set ¢ = 0 for each member of the population. The number X 
of individuals exhibiting R, is now 


X=N, fam) J paaa, (15) 


We shall consider the special case in which »,(4) and q(h) are 
given by the following expressions: 


p,(A) =~ em, (16) 
ab 
ah) = (ee). (17) 


We then find the following expression for I(yw) when y > 0: 


(o-) (oo) -y-h 
I(y) =} if a(h)| held — f eww | dh, (18) 
0 —y+h —cO 


or 


= b 2o-ky — fp2e-v 
2I(y) =1+ (Payee) (a?e ew) 
(19) 
a 


fe Eee  e H). 
(i —07)(0— @) ( 
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Now if one of the parameters a or b in q(h) is much larger 
than the other, e.g. a >> b, q(h) is a simple exponential function 
be", and 


I, (+y) = £4/(2 (1— e?¥) 
— b?(1— e*)}/(k?— b?)|, (a >>b). 
Thus I,'(y) > 0, 1,/(0) = kb/2(k + b), while I," (y) < 0 fory>0O. 
Thus 
a) if as/24N, > kb/2(k + b), the origin is the only equilib- 
rium point and it is stable; 
b) if as/2AN, < kb/2(k + b), the origin is an unstable equi- 


librium point and there are two stable equilibria at +y., where 
I, (we) = asy-/2AN, . 


(20) 


If a= b, so that q(h).= b?he-™, we find 
21. (py) =1 
i (3b? — k? + b®'y — k?by) b? (kh? =- 67) 


(2 By Me. 1 eee ae 


(21) 


Furthermore, if b =k, it is readily shown that J,'(0) = 6/8 and 
I," (1/k) = 0; hence there is a point of inflection at 1/k. More gen- 
erally, we find that unless a or b becomes infinite there is an inflec- 
tion point. When b > k the inflection is more pronounced and the 
results of plotting J(y) versus by for various values of k& are similar 
to the results in the first case with constant h , but with variable bias. 

If I” > 0 in the neighborhood of the origin, then there will be 
an inflection point and it will be possible to have five equilibrium 
points. The differentiation can be carried out before integrating 
equation (18). The function J will be found to be concave upward 
near the origin if 


q(0) <k f q(h)e*dh . (22) 
0 
On the other hand, if p(A) is the normal distribution g(h/s) and if 


q(h) is arbitrary, then the function J will be concave upward in the 
neighborhood of the origin if 


i "g(h/s) (h2/s?—1) q(h)dh > 0. (23) 
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It may be noted that one would expect the variability in the 
threshold (h) or in the individual bias (¢) to be greater than the 
variability in the fluctuations. Thus, in this last case, the occurrence 
of a pronounced inflection would seem to be rather likely. 

The author wishes to express his appreciation to Drs. N. Ra- 
shevsky and H. G. Landau for reading and discussing the manuscript. 


LITERATURE 
Landahl, H. D. 1988. “A Contribution to the Mathematical Biophysics of Psy- 
chophysical Discrimination.” Psychometrika, 3, 107-125. 
Landau, H. G. 1950. “Note on the Effect of Imitation in Social Behavior.” Bull. 
Math. Biophysics, 12, 221-236. 
Rashevsky, N. 1949. “Mathematical Biology of Social Behavior: III.” Bull. 
Math. Biophysics, 11, 255-271. 


BULLETIN OF 
MATHEMATICAL BIOPHYSICS 
VOLUME 12, 1950 


PSYCHOPHYSICAL DISCRIMINATION WITH MORE 
THAN TWO STIMULI 


N. RASHEVSKY 
COMMITTEE ON MATHEMATICAL BIOLOGY 
THE UNIVERSITY OF CHICAGO 


A generalization is suggested of H. D. Landahl’s theory of psycho- 
physical discrimination by considering instead of two cross-inhibiting 
parallel chains of neural pathways, n such chains. The probabilities 
of the » possible different reactions are in such a case expressed by 
(w — 1)ple integrals. For n = 3 the method is illustrated by evaluat- 
ing the probability of a reaction to the weakest of the three stimuli. 


As a generalization of H. D. Landahl’s (1938; Rashevsky, 1948, 
chap. xxxiv; hereinafter referred to as MB) circuit of two parallel 
cross-inhibitory pathways, consider the case of three such pathways. 
Let the constants of the cross-inhibitory pathways be such that when 
the excitations «,, «, and «; of the three proximal connections are 
equal, the inhibitory effects just compensate the excitatory. This re- 
quires that the amount of 7 produced by the cross-inhibitory path- 
way, l,m, leading from the Ith pathway to the mth (1, m=1, 2,8) 
be equal to «,/2. 

Consider the case in which 


ESE iat 


—eée,—A>d0. (1) 


In the absence of fluctuations the reaction R; in the third pathway 
will not occur when (1) holds. Due to fluctuations at the connections 
it may, however, happen that the excitation at the third connection 
will exceed the half-sum of the excitations at the first and second 
connections, and a reaction R; will occur accidentally, as a “wrong” 
reaction. If ¢:, ¢2 and ¢, denote the accidental additional amounts 
of excitation at the three corresponding connections, due to fluctua- 
tions, then in order that R, will occur, we must have 


€; + di + & + do 


: — (e, + d;3) <0, (2) 
or 
4 + Zz 
ds > A oF, (3) 
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Let the probability that any of the ¢;’s will have a value between 
x and « + dx be p(a)dx. That is, the probability of ¢1 having a 
value between $;, and ¢, + d¢u is p(¢:)d¢1, while the probability of 
$2 having a value between ¢2 and ¢. + dd. is p($2)dd2, etc. Then 
the probability of inequality (3), when the values ¢, and ¢2 are 
known, is 


IL p(x) da = F(A, b15 ¢2)- (4) 
A+ (1+2) /2 


The probability that ¢, and ¢. have prescribed specific values is 


DP ($1) DP ($2) dpidde . (5) 


Therefore, the probability that inequality (8) holds for specified val- 
ues ¢; and ¢» is given by 


DP ($1) DP (d2) F(A , b1, $2) dbidde . (6) 


Hence the probability that (3) will hold for any values of ¢; and ¢2, 
that is, the probability P,, of R;, is given by 


P(A) = J it D(d1) D(b2) F(A » b1 » $2) ddidde . (7) 

A similar procedure may be applied to the case of » stimuli, and 

we obtain an expression for P,, in terms of an (n — 1) ple integral. 
In order to compute P,,(A) explicitly, we must specify the func- 


tion p(x). For the same reasons as before we shall use the following 
expression (Landahl, 1938) 


k 
p(x) Tay en (8) 


The function p(x) has two different expressions, according to 
whether x > 0 or x< 0. We have 


For x >0 p(t) =p(a) =e; (9) 


For z <0 p(t) =p(2) =S ee, (10) 


In evaluating F(A, ¢:, $2) we obtain different expressions, de- 
pending on whether $(¢, + ¢2) > — A or 4(¢1 + ¢2) <— A. In the 
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first instance we have 
F(A y=2 ile -ksq 
ee 2 reece x 
(11) 


In the second case we must divide the range of integration into a 
negative and positive part, and find 


Pes ks Z 
F(A ’ di ’ dz) ——_ e* lida 
2 A+(b1+-2) /2 
k (0) k oO 
—— err — if e* da (12) 
2 Jas($s+$2)/2 2 Jo 


(a) 
=1—te oe. @ 


In evaluating the integral (7) we must break up the range of 


integration according to the values p(x), p(a), F, and F of the func- 
tions p(x) and F(A, ¢1, dz). 
First we break it according to the values of p(x”). We have from 


(7) 
P,,(A) = [ r(esrae. [BOF (A, $55 40) d6, 


(18) 
2 [ Pordge J D (di) F(A, ba 52) des - 
In evaluating the integral 
if S(O El ae, 90) Gh (14) 


we must find the relation between ¢, and ¢2 within the range of nega- 
tive values of ¢.. We have seen [equations (10) and (12)] that 


F=PF fOPe tor Oe 1; 
a (15) 
i Fo et0r dy 1 Ge 2 A, 


In the ¢:, ¢2 plane draw the straight line ¢, + os — 12 A 
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(Figure 1.) Above the line F = F; below the line F = F. 


¥ 


9+ 9, =-20 


FIGURE 1 


If 6. > — 2 A, then F= F within the range of ¢é: from — o to 
— (2A + $2); while for ¢, between — (24 + ¢.) and0,F =F. If 


go < — 2A, then F = F for all negative values of ¢:. Hence the in- 
tegral (14) can be written as follows. 
For ¢2 > — 2A: 


Ot -(2A+¢2) _ = Lt) =i + 
( Di) Fag. = [ p(¢1) Fd, + p(¢:1)Fd¢.; (16) 
e/ oo -00 —(2A+¢2) 
For ¢. <(—1254; 
{ D ($1) Fdg, = ij D($:) Fd¢y « (17) 


By a similar argument we find for the last integral of equation 
ae hag = 2g: 
Jo r@orap.= f° peda. (18) 
For ¢2 < —2 A: 


Oo + s —(2A+ob2) + = foe) + + 
Jo p@ordn= f™ p@nFag.+ [7 ao Fag.. (sy 
(0) 0 —(2A+o2) ; 
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Because of (9) and (10) the range of integration with respect 
to ¢. in the first term of the right-hand side of (13) must be broken 
into (— «0, 0) and (0, + o). Because of (16) and (17), the range 
(— «,0) must again be broken into two: (— «o, — 2 A) and 
(—2 A, 0). Keeping this in mind, we find for the first term of the 
right-hand side of (13) 


—(2A+ 


ie D ($2) dos J r@oFas. oR fred] mR p ($:) F dd, 


s J, Peer Fae|s i) “p($2) da] f 3 


tts fee ee yaaa 
ieee ee Pr 


A. 


+2) = 
D(o:)Fd¢, (20) 


In a similar way we break up the second term of the right-hand 
side of (13). Using (9), (10), (16), (17), (18) and (19) and 
evaluating all the integrals in (20) and in the corresponding ex- 
pression for the second term of the right side of (18), we finally 
obtain 


P,,(A) =Feu-(S+ 7) ote (21) 


For 4=0,P.~=—+4. This at first seems paradoxical. When ¢, = & = £3 
we have 4 = 0, and we would expect that because of complete sym- 
metry the probability of any of the three reactions would be $. We 
must remember, however, that whereas in the case of two stimuli 
only one of the reactions may occur, in the present case this is not 
necessarily so. Therefore P,,(0) means the probability for e, = «2 = «3 
of R; to occur either alone or in combination with either R, or Rz, 
but not with both. 

Let us compare equation (21) with the one which we obtain if 
we assume that one of the stimuli — say, «. — is entirely cut out. By 
this we do not mean that «, = 0, for that case is covered by the 
preceding argument. We actually consider one of the pathways as 
not functioning, so that now we have 

&1 


i i (22) 


We may now compute P,,(A) of the reaction FR, . 
The condition for R,; to occur is now 
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eS: e see (23) 
the probability of which is 
ie p(w) dx =F (A, 41). (24) 
A+1/2 


The total probability P,.(A) is given by 
Pe'(4)= | Fra é, (25) 
and is readily computed in the same way as before. We find 
2 iL 
Py (A) =— e*4 — — es, 26 
(A) Be 5 (26) 


It is readily seen that for large values of 4, when the terms in 
e*4 become small compared with those in e*4, P,'(A) < P,(A). 

It is also readily seen that for small values of A P,,’(A) < P,(A). 
We have P,,(0) = P,(0) = 4. Furthermore for 4 = 0 we have 
dP, /dA =— k/3; dP,,/dA = — 5k/18. 

The author is indebted to Mr. George Karreman for checking the 
manuscript. 
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The discussion given by N. Rashevsky (1949) on the effect of imi- 
tation in the mathematical biology of social behavior is generalized 
by assuming the distributions involved to be normal rather than Laplace 
distributions, and also by showing how most of the results can be de- 
rived without assuming any specific form for the distributions. In par- 
ticular, it is demonstrated that it is possible, in a sufficiently large 
population, to have a stable behavior pattern which is quite independ- 
ent of the desires of the population or of their inherent pattern of re- 
sponse. 


Introduction. 

N. Rashevsky (1949, in the following referred to as R) consid- 
ered the effect of imitation in a population where each individual 
could respond to a given stimulus with one of two possible reactions, 
R, or R,. Using the notation of R, each individual is characterized 
by ¢, the net excitation to response R,, where ¢ is distributed in 
the population so that the number of individuals whose ¢ is between 
¢ and ¢ + d¢ is given by 


N(¢)d¢ = Non($) d¢ , (1) 
where N, is the total number of individuals in the population, and 
[_ n(ydp=t. (2) 


In the absence of imitation, an individual responds with reaction Rk, 
if ¢ + ¢’> 0 (and R, if ¢ + ¢’ < 0), where ¢’ is a random excitation. 
This idea was introduced by H. D. Landahl (1938) in his treatment 
of psychophysical discrimination. The probability density function 
of ¢' is p(¢’), so that the probability of reaction R, for an individual 
¢ is 

P(g) = { v(o as. (3) 

-9 
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It is assumed that the effect of imitation is to add an additional ex- 
citation, the stimulus toward R, being proportional to X , the number 
of individuals exhibiting R,, and the corresponding stimulus toward 
R, being proportional to N, — X . These excitations due to imitation 
are assumed to follow the usual laws for neural excitation (Rashev- 
sky, 1948), so that the net excitation to R, due to imitation, y, is 
given by the differential equation 


dy 
Gp =A (2X — No) — ay, (4) 


the positive parameters, A and a, being assumed to have the same 
value for all members of the population. The quantity X is given by 


X(y) = [ Pet NO) dé. (5) 


In R it was assumed that n(¢) and p(¢’) were Laplace distribu- 
tions, i.e. 


n(¢) Se e-79| 


and 


ON, k -kid’ 
vy (¢ ) = 5 e lp Ky 

and the nature and stability of the solutions of (4) were studied. 
Here similar results are derived assuming n($) and p(¢’) to be 
normal distributions. This is not only a more plausible form but the 
calculations are simpler. It is also shown that most of the conclu- 
sions may be established without assuming a specific form for the 
distributions. 


Normal Distribution: Unbiased Population. 


We use the following notation for the normal distribution and 
its integral 


g(x) = (22)4e 2 
2 (6) 
G(x) = f 9(a'yda’. 


If we now take for n($) and p(¢') 
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n(g) =or9 ©), 
p(¢') = sof *) ’ 
s 
then we obtain from (3) and (5), 


ron =™ J are(**2)] o( #)o 


Writing (4) as 


dy 
GY) —ayp=F(y), 
so that 
X(y) 1 


i) =———= 
(y) N, 3? 


we have from (8) 


tonne fie(**) «(4a 


-o 


(7) 


(8) 


(9) 


(10) 


(11) 


This definite integral is easily evaluated. Putting « = ¢/co, a =<a/s, 


8 = y/s in (10), we obtain 


I= [ G(ax +B) oxyde 


and 


oe 9 (aa +B) g(a) da 


=r p | 9 ML a ee ae dx 
Vi +a JJ VER 


==) (Jisag) tg P ) foes 
rd Sten -0 


=a te)o/ 1) 
SVE thai oR 


Hence 


(12) 
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1=6( aq) #0. 
Af Laie ae 


Since G(x) is an odd function and g(x) is an even function, J = 0 
for 6 = 0, and hence f(a) =0. This gives 


p 
Volare 
and therefore 
W 
ty) =@( J. (14) 
iprote 
The differential equation (9) for y now becomes 
dy 
—=2AN, G | ———_] —av=F (y). (15) 
dt \/ S34 07 


The behavior of the possible solutions of (15) can now be deter- 
mined. We first note that F(0) = 0, so that y= 0 is a possible 
solution. This will be stable, in the sense that if y is changed by 
a small amount to a value different from zero it will return to zero, 
whenever dF'/dy < 0 at y = 0; that is, 


2AN, 
—a<0 


/2a(s? +o?) 


or 


Figure 1. The alternatives for F(y¥) for normal distributions, unbiased 
population. 
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BON Sisto: Sse 
= — San, > (2x) 4 = 3989. (16) 


In this case 


dF 2AN, wy 1 yp 
ape neat kn a i — 0. ol | 6 17 
Sone ga) s)-] 


will be negative for all values of y, so that dy/dt > 0 for y < 0 and 
dy/dt < 0 for y > 0. This means that no matter what value y may 
have initially, it tends to return to the value of y = 0. This is the 
situation shown in Curve I of Figure 1. Since y = 0 gives X = N,/2, 
this means that a population characterized by values of the para- 
meter in this range tends to be equally divided between reaction R, 
and R,. 
If the inequality (16) is reversed to 


he C0 eS (18) 


then yw = 0 will not be a stable solution. Suppose w is increased to 
a small positive value, we have F'(y) > 0 and from equation (15) it 
can be seen that y will continue to increase up to the value for which 
F(y) =0. That a positive root y* of F(y) = 0 exists can be seen 


from the fact that 
w 
Gita ane tt Ss 
Vana 


while ay increases indefinitely. This is the situation shown in Curve 


II of Figure 1. Since 
y 
g jae 
Vi +e 


is monotone decreasing for y > 0, it follows from (17) that there 
is only one positive root, y*. By symmetry, there is a negative root 
of equal absolute value — *. From (17) and (18), we see that 
dF (0)/dy > 0 and decreases monotonically for yw > 0, hence 
dF (y*)/dy <0, so that y* is a stable value of yw, and similarly 
—y" is also stable. 

Hence we have the alternatives: either (16) holds, y = 0 is 
stable, and the population is equally divided between R, and R, and 
always returns to this configuration after any fluctuation; or (18) 
holds, y = y* and y =—-y” are stable, and any positive fluctuation 
causes the population to move toward and then remain at y = y" 
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with an excess of R, over Rz (or a negative fluctuation sends it to 
the stable value y = —;* with an excess of Rz over R,). The equality 
y= (2a)? (19) 


can also be seen to give only y = 0 as a stable value. 


ce) 05 a A) e 25 | 35 4 
of = aVs? +02 
2AN 


: FIGURE 2. The roots of F(¥) = 0 for normal distributions, unbiased popu- 
ation. 


Numerical values of y* can be calculated as follows. If we put 
E=y/Vs* + o?, F(w) = 0 becomes G(é)/E = y . Suppose y= G (x) /x 
and let x = H(y) be the inverse function for positive values of x; 
then § = y*/\/s? + o? = H(y). This relation is graphed in Figure 2. 
For small y, we have approximately 
v3? rs o BY AN> 

2y ira 
Since y is small for large N,, equation (20) would hold for a large 


population. At the other end of the range we can also obtain a sim- 
ple approximate formula for y*. Let 


y= (22)? (1-2), (21) 


(20) 
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where « is a small positive quantity; then 
Oo fOe (seo) 1% (22) 


The upper and lower dashed lines in Figure 2 show the accuracy of 
these approximations. 

It should be noted that the value of y* in (20) does not depend 
on the distribution of ¢ (i.e. the parameters o and s), but only on 
A,aand N,. These parameters are determined by the effect of the 
tendency to imitate and the population size. 


Normal Distribution: Biased Population. 

Suppose the population is biased in its distribution of ¢, the 
characteristic net excitation to response R,. We assume that instead 
of (7) the distribution is given by 


n($) =org(2—*) ; (23) 


where ¢ > 0 so that the population is biased in favor of R,. Then 
we have 


xw@=—=2 f [at+e(2"))o(*)e, ew 


and 


(oy 


roy me fe( 2) (4) 


(25) 
ee pe =f as 0 1 
nr Sols") o()s0 
¥& s o 
so that from equation (14) it follows that 
yt do 
rw =4( TS). (26) 
Vs? + 0° 
The differential equation for y now becomes 
d Ww ra go 
a= Pane G(T )—av= Fw). (27) 
dt. NYS 4 ig” 


In this case (0) > 0, but it is easy to see that there will al- 
ways be a positive root F (y*) = 0 for y* > 0 because 


228 IMITATION IN SOCIAL BEHAVIOR 


| wy + do 
°( saa) 
Verte 


while ay increases indefinitely as increases. There is only one posi- 
tive root and it is stable since 


dF 1 y+ do : 
—_=@ SAO pee was 
dp | ? ( Vera yi 


decreases with increasing y for y > 0, and this together with 
F(0) > 0 means that there can be only one positive root F(y") = 0 
and that dF (y*) /dy < 0. This is the situation in Curve I of Figure 3. 


1 
<4, 


FIGURE 8. The three possibilities for F(vY) for normal distributions, biased 
population. 


If inequality (16), y > (22), holds, then from (28) we see 
that dF'/dy < 0 for all values of wy, and the positive root is the only 
one. For y < (22) it is possible, depending on the value of ¢o, to 
have one or two negative roots. The condition under which these 
possibilities occur is derived below. That these are the only two 
possibilities can be seen from equation (28), which shows that 
dF'/dy = 0 for at most one negative value of y, and must be negative 
for large negative values of y, since dF/dy ~ —aasy > — ow. 
Hence we have the two situations shown in II and III of Figure 3. 
In HU, F(y) = 0, and dF/dy = 0 for the same value of y. Such a 
root is unstable. In III, there are two negative roots and the one 
with the smaller absolute value is unstable, but the other is stable. 

The condition III can occur for sufficiently small » for any value 
of ¢. If we interpret the bias $. as expressing the desire of the 
population for R,, this means that in a sufficiently large population 
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it is possible to have a stable condition in which the majority behaves 
contrary to its own desire because of the effect of imitation. An 
effect of this kind may be part of the explanation of how the co- 
operation of the people of a country is obtained in an unwanted war. 

We can derive the condition for F(y) to have negative roots as 
follows. From the discussion above it can be seen that y must be 
smaller than the value for which F(y) = 0 and dF/dy = 0 have the 
same root. If we put 


We go 


ee alle oo (29) 
VW S2 se? ViSter a: 
then F'(w) = 0 and dF /dy = 0 become 
G(é + &) —yé=0, (30) 
Gea Co) gry U's (31) 
Hence we obtain 
G(E + &o) = 9 (€ + &). (32) 


The root € of (30) and (31) gives the point at which G(é + &)/& hasa 
negative maximum, and y must be less than this maximum for nega- 
tive roots of (30) to exist. 


To utilize (32) we need the following expansion of G(x) (Ros- 
ser, 1948), which is obtained by successive integrations by parts: 


G(e) = [a(bat=a9(a) zt [eowmae 


=n (2) += g(a) +4 f “tg (t) dt (33) 


iy Lahaye aaa 
— x2 +— ae ‘ 
a(z)| ibs Sols Tay 


Therefore from (32) it follows that 
RCo os) go es Leo de “serie 


= } 34 
eSe4 aa Go 
or ate 
pa [Cree ( + &)° fn (35) 
1-3 1-3-5 


Solving (31) for & + & gives 
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E+ &=— (— log 2ay?)?. (36) 
Hence from (35) we find 


Geel 
— | ae (— log 2ay?) */? + (— log 2ny?)*/? + oo] - (87) 
‘oO 


1-3-5 
This is the condition for the existence of a single unstable root. For 
two negative roots to exist y must be smaller; that is, the following 
inequality must hold 


if 
E> ae (— log 2ay?) %/? + (dog 2ayt) te - (38) 


1 
1-3-5 
For ¢) = 0, inequality (88) of course reduces to the previous in- 
equality (18). 

For small values of &, y must be near (22)-? to satisfy (87). 
Then we can use the first term of the series, and (38) becomes 


y< 9( (36) ¥*). (39) 
For large values of & , y > 0 and (38) becomes 
a Mi 
<— or —<-—. 40 
bs 2&o AN>o Po ( ) 


Formulae for the roots in this case would be rather complicated 
but it is easy to determine them graphically. Letting 


y + do 
Vito 


we need merely plot G(7) against 7 as abscissa and on the same 
graph draw the line with ordinate y( 7 — &). If 7* is the abscissa 
of the point (or points) of intersection, then the root y* is given by 


n=§t+E= (41) 


——— 


p= Vs? + o7 n*— do. 


General Distributions. 


In this section we show that many of the conclusions in R and 
in the previous sections can be established independently of any as- 
sumed specific form for the distributions n($) and p(¢’). Equations 
(9) and (10) are independent of the form of these distributions. 
From the definition of X (y) as the number of individuals exhibiting 
response R, , it follows that 


0<X(w) < Np (42) 
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and hence 


gn ol (Wi a: (43) 
From the continuity of J(w) and (9) it follows that F (wy) = 0 has 
one or more roots, y*, and that these roots lie in the interval 


AN o AN, 
Sap Ss : 
Fs Y i (44) 


Furthermore, either dF /dy must be negative at some root or else 
dF (y*)/dy=0 with F(y) < Ofory > y* and F(y) > 0 for p< yt, 
so that there is a stable root. 

The bounds for this root in (44) are the limiting values in equa- 
tion (20). We can also show that as N, becomes large, there always 
are two roots which asymptotically approach these limits. From (3) 
and (5) it can be seen that as y > «o, X(w) — N, and, consequently, 
I(w) > 4. That is, for any « > 0 there is a we > O such that 
(1—e«)/2<I(w) < 4 fory > y.. Hence 

F(w) =2AN 1 (y) —ay>0 for ye <y< ee) 
and 


A 
F(y) <0 for p= es 


from which it follows that for sufficiently large N, there must be a 
stable positive root y* satisfying 


AN, 
a 


AN 
(1—2) <y* Se (45) 


By considering y ~ — o, we find a similar negative root. 

The significance of this result lies in the demonstration that 
imitation makes it possible, in a sufficiently large population, to have 
a stable behavior pattern which is entirely independent of the desires 
of the population or of its inherent pattern of response. 

We now assume that n(¢) and p(¢’) are symmetric or even 
functions; that is 


n(— $) =n($) and p(— ¢') = p(¢) , 
so that 


+There might be a whole interval of roots, F(y) = 0 for ¥,* < ¥ < ¥,°: 
with dF (¥,*)/dy < 0 and dF (¥,*")/dy < 9. Then every point in the interval 
is a stable root. 
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{ nap=4 and Pi Coe) eee (46) 
Then from (5) and (10) we find 
C= ee [Palo + vym(gds. (47) 


Dividing the integral into two parts and changing the sign of the 
integration variable we obtain 


Hy) =—4+ [ Pilg + vale) dg + [ Pio + wyn(o)ds 
=—4+ [Ppt y)n(odst | Pe + vas) d¢ 


es [Pig y) +1—Pil(g¢—y)] n($)d¢. 


From the above we finally obtain 


1(~) = [ [Pe + y) —Pilo—v)] (4) ds. (48) 
This shows that in the present case 

I(y) =—I(—») (49) 

and (0) = 0. Hence there is a root F(0) = 0. 

| Now 
ry) === [+ v) +p—v)1 mas, 60) 
beige I en: vy) + p(o—y)] n(¢)d¢, 
and 

(0) =2 { p(g)n($)d6 > 0, (51) 


because p(¢) = 0 and n(¢) = 0 and neither can be identically zero. 
The root at y* = 0 will be stable or unstable according to whether 


F’ (0) =2AN,I' (0) —a 
is negative or positive. Hence 


a 


I x 
(0) 2AN, 


(52) 
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is the condition for y* = 0 to give a stable solution. When applied 
to the normal distribution this leads to inequality (16). In this 
case we cannot state that there may not be additional stable roots; 
however, if (52) does not hold then from the reasoning above and 
the fact that F (yw) is an odd function, as shown by (49), it follows 
that there is at least one stable positive root and also at least one 
stable negative root of equal absolute magnitude. 

Now in addition to requiring that n(¢) and p(¢’) be even func- 
tions, let us also require that they be unimodal, that is, 


nN ($1) <n(¢2) for |: e |p2| ’ 
D(gr') SP(do’) for |u| > |e’. 


With these additional restrictions on the distributions it is possible 
to show that the alternatives found for the normal distribution are 
the only possibilities; that is, either there is a single stable root at 
wy = 0 or the root at y = 0 is unstable and there are two stable roots 
+ y* #0. This results from the fact that under these conditions 
I(w) is also unimodal as we now show. That I'(w) is an even func- 
tion is apparent from (50). From (47) we have 


(53) 


I'(y) = [2 + y)n(o) dd, (54) 


so that 


I(p+h)= { r@ + p+h)n(g)do= fr + y)n(@—h) dg , (55) 


and 
I(y—h) = [vot vynlg + dg. (56) 
This gives 
AI (py) =I' (pw + h) —I' (p—h) 
z (57) 
= [p(s + v) In(o—h) — lg + b)1 ds. 
Since I'(y) is even we also have 
rw) = [ pg—vyntg)ds, (58) 


and in the same way as (57) was obtained we find 
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AI’ (y) = [ 2G-w [n(o +h) —n(o—h)] de. | (59) 


Therefore we finally have 


AI’ (y) 
(60) 


= [ (PG—v) —P@ + vino +h) —n(g—h)] dy. 


In this formula we take h and y to be positive; then for ¢ 2 0, (53) 
shows the second bracket to be < 0 while the first bracket is 2 0, 
hence the product is < 0. When ¢ < 0 the signs of the two brackets 
are both reversed so the product is again < 0. Hence Al’(y) < 0 
for y > 0, which completes the proof that I’(y) is unimodal. 

The general distribution with bias can also be treated. Suppose 
again that n($) and p(¢’) are even functions, but that 


N(¢) =Non($ — $0) 
with ¢. > 0. Then the value of I is 


bly) =—4+ f Pus + v)n(o—oa)de 
; (61) 
=~1+f P.(b + p + bo) 2 (6) do =1(p + do), 


where I(w) is the value of J for ¢. =0. From the reasoning above 
there is always at least one stable positive root and there will be just 
one if n(¢) and p(¢’) are unimodal. In the latter case there may be 
just one unstable negative root, y = — y,* < — go, and also just one 
stable negative root, y = — y.* < — y,*. Obviously for a stable nega- 
tive root we must have a/2AN, < I,'(—y,"), and since Iy’(— y,*) < 
I,'(0) =I'(¢0) we have as a necessary, but not sufficient, condition 
for a stable negative root 


a v 
2AN, <I’ ($0). (62) 
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A REMARK ON LANDAHL’S THEORY OF LEARNING 
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It is pointed out that the equations of Landahl’s learning theory may 
be formally interpreted in terms of a different neural network than 
that considered originally by Landahl. The suggested interpretation is 
based on this author’s theory of elimination of a wrong act through a 
delayed conditioned reflex which produces a reaction opposite to the 
wrong act. 


H. D. Landahl’s theory of learning (Landahl, 1941; Rashevsky, 
1948) is based on the assumption of the neural network shown in 
Figure 1. The correct response R,. produces a change R, in the en- 
vironment, which, acting itself as a stimulus, results in an increase 
of « at the connection s,. Similarly, the wrong response Ry» via a 
change R, results in an increase of 7 at sy». 


FIGURE 1 


If interpreted too literally, the network of Figure 1 may lead 
to the misconception that the correct and wrong responses are 
already predetermined by the structure of the network, since the 
structure shown in Figure 1 permits only excitation of s, and in- 
hibition of s,. Actually, s, receives pathways from a number of cir- 
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cuits C: Cy, Co---» Cn. Some of those pathways are excitatory, some 
inhibitory. We shall call a circuit, C;, which sends an excitatory 
pathway to s, an excitatory circuit. A circuit C; which sends an in- 
hibitory pathway to s, will be called an inhibitory circuit. All the 
circuits are stimulated by appropriate stimuli R,: Ri:, Ris +--+ Rin- 
The connection between R, and R, is not a part of the network but 
part of the environment which is under the control of the experi- 
menter. If R, produces an R,; which excites an excitatory circuit C;, 
then R, is a correct response. Otherwise, it is a wrong one. Thus 
R. may be the entry into a given alley. But it is left to the experi- 
menter to put either food (an R,; which produces excitation at s, 
via an excitatory circuit C;) or a device for an electric shock (an 
Rx which produces inhibition at s, via an inhibitory circuit C,) at the 
end of the alley. The same holds regarding the connection s, and 
the reaction FR, . 

The purpose of this note is to show that formally the same re- 
sults as in H. D. Landahl’s theory are obtained by considering a pre- 
viously discussed mechanism of elimination of a wrong act (Rashev- 
sky, 1936; 1948). That theory is based on the idea that a wrong 
act results in a stimulus which produces a reaction opposite to the 
wrong reaction. Thus a dead end alley results in a return of the 
animal, that is, in a locomotion in the direction opposite to that of 
the original locomotion into the alley. This opposite reaction even- 
tually develops as a delayed reflex to the original stimulus which 
produced the wrong reaction and, thus, reduces that wrong reaction 
in intensity. But this reduction in intensity is due not to an inhibi- 
tion along the chain of pathway s, — R,, but to a purely physical 
mutual weakening of two opposite reactions. 
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Now consider the network shown in Figure 2. Here the synapses 
8, and s, are inhibited by pathways which are stimulated not by S, 
and S,, as in Landahl’s well known network for reciprocal inhibition, 
but by the reactions R, and R,. For stationary states we have the 
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same situation as in Landahl’s standard case; namely, no reaction 
for |S: — S:| < h, where h is the threshold of the system; reaction 
FR, for S; — S. > h; and reaction R, for S. — S, > h. The applica- 
tion of the theory of random fluctuations leads to similar results as 
in Landahl’s case. Now if one of the reactions—say, R,—is the wrong 
one, it will be reduced by a process described above, and this will en- 
hance the other reaction R, , by reducing the inhibition at s . 

The original theory of error elimination, mentioned above, does 
not provide for an increase in the correct response. This, however, 
may be easily obtained by a similar mechanism. 

It may be noticed that learning will occur in Landahl’s theory 
even without reinforcement of the correct response, but merely with 
a gradual inhibition of the wrong one. This amounts to putting the 
constant b (Rashevsky, 1948, p. 470) equal to zero. This cannot 
be done in the final equation [Rashevsky, 1948, p. 471, equation (7) ] 
for reasons which the reader will readily see if he follows step by 
step the procedure outlined on p. 471 of loc. cit. However, if we 
put b = 0 from the beginning, we have, as in loc. cit., 


ey et 4 e-KEc-Ew) | (1) 
dn 
Using the same notations as in loc. cit. and putting, without loss of 
generality, &o¢ — ow — 0, we find 


d 
a nbe (2) 
or 
d 
2 
Hence, 
aay POY (4) 
kp 2 


where C is determined from the condition that for n =0, w= 
Hence C = 1/k6. Therefore, equation (4) gives 


w= Zo log(1 +>). (5) 


The cumulative number of errors increases in this case indefinitely, 


240 THEORY OF LEARNING 


although dw/dn tends to zero. The ratio w/n decreases to zero, so 
that the task is learned better and better. 
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CONTRIBUTIONS TO THE MATHEMATICAL BIOPHYSICS OF 
THE CENTRAL NERVOUS SYSTEM WITH SPECIAL 
REFERENCE TO LEARNING 
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A learning theory based on the lowering of thresholds of neurons 
under certain conditions is applied to two “random net” models. The 
first, a so-called “ganglion-brain” is characterized by completely ran- 
dom connections of all afferent tracts except certain ones which form 
the pathways for unconditioned responses. Certain expressions are de- 
rived which measure the learning potentiality of the ganglion — in 
particular, with respect to the number of responses which can be learned 
(conditioning potential) and the amount of interference between the 
learned responses (redundance potential). 

The second model concerns the progressive refinement of a response. 
The efficiency of learning in this case is reflected in the eventual specifi- 
city of the response which, in turn, depends on the modification of the 
distribution of thresholds associated with the neurons governing the re- 
sponses. Expressions are derived relating the initial distribution of 
thresholds, the relative effectiveness of the various responses, and cer- 
tain other parameters to the final distribution of thresholds. For a par- 
ticular choice of the effectiveness distribution of responses the pro- 
gressive sharpening of the threshold curve (i.e., progressive specificity 
of response) is demonstrated. Some implications of the model with re- 
spect to the evolution of nervous systems are discussed. 


It is now a rather commonly held hypothesis that the behavior 
of organisms possessing nervous systems (especially the more com- 
plex ones) will be greatly elucidated when the dynamics of neural tis- 
sue is better understood. Furthermore, it is held by many that one of 
the most fundamental aspects of neural dynamics depends in essence 
upon the topology of the connections (synapses) of neurons with one 
another, and the communication problems which such connections 
imply. These synapses and the patterns of neuron relations which 
they form (neural nets, as they are often called) are assumed to be 
of the greatest importance in all psychological phenomena. From 
this point of view the similarities between neural nets and the con- 
nections of vacuum tubes to one another in electronic computers, and 
the corresponding “intelligences” exhibited by these systems, are tak- 
en to be somewhat more than analogical. 
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N. Rashevsky, starting essentially with this point of view, was 
one of the earliest workers to give impetus to these notions and to 
systematically study their implications. This fundamental work is 
summarized in his book Mathematical Biophysics. Important ad- 
vances of this approach made by H. D. Landahl and A. S. House- 
holder are compiled in a monograph entitled Mathematical Biophysics 
of the Central Nervous System, wherein can also be found a summary 
of the ideas contained in a paper written by W. S. McCulloch and W. 
Pitts in 1943. The essential goal of that paper was an attempt to 
use the language of symbolic logic for the purpose of describing the 
behavior of neural nets of all kinds, where only certain “initial con- 
ditions” are known. Given information about the “state” of the neu- 
ral net at some specified time, the method enables one to calculate all 
subsequent “‘states” of the net. 

Although the attempt of McCulloch and Pitts represents an im- 
portant step forward in neural net theory it has some serious short- 
comings. For one thing, its definitive results are not generally applic- 
able (at least in their present form) to neural nets containing cycles, 
i.e., to nets having closed pathways. Inasmuch as such networks are 
of general occurrence (Lorente de No, 1934), any mathematical tech- 
nique purporting to describe neural dynamics cannot be considered 
adequate without subsuming such phenomena. Another shortcoming 
of the symbolic logic approach is the extremely laborious computa- 
tions which must be made in order to analyze systems containing 
large numbers of neurons. Still another objection is the assumption 
that the system is somehow “locked” in phase, i.e., it is assumed that 
if any neuron in the system fires at some time ¢t = t, , then any other 
neuron in the system can fire either at exactly that same time or at 
some future time t = t, + no where o is taken as a constant and n is 
any positive integer. There has been some suggestion that this diffi- 
culty could be overcome by a slight modification of the mathematics. 
However, no such extension has appeared in the literature so that the 
point is still open to question. 

The last and perhaps most serious shortcoming of the method 
of McCulloch-Pitts lies in the fact that it assumes an extremely de- 
tailed knowledge of the neural pattern under consideration. In fact, 
the method requires exact information concerning the relation of 
every neuron in the system to every other neuron. The gathering of 
such information concerning any one nervous system is a monumen- 
tal task of incredible proportions and even if ever achieved would 
undoubtedly be inapplicable to any other system. Also, it is required 
that the threshold of every neuron in the system be known, the other 
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parameters being taken as equal for all neurons. Finally, given all 
of this information, the “state” of the system at some specified time 
t = t, is also required. This means that we must know of every neu- 
ron in the system whether or not it is firing at some time t = ¢,. An 
impossible task. 

These objections to the McCulloch-Pitts formulation, though 
they are indeed quite serious, cannot negate its importance as a for- 
mal approach to the subject. 

In 1948 the author, together with A. Rapoport, published a pa- 
per in which was laid down a somewhat different approach to this 
problem. In this and subsequent papers an attempt was made to use 
the notions of probability in studying the structure and function of 
the nervous system. Many of the ideas developed in those papers 
thus parallel the development of cybernetics by N. Wiener and his 
associates and of the mathematical theory of communication by 
Shannon. 

In essence, the problem which was encountered was that of in- 
venting variables which could be used to characterize the state of a 
nervous system at any given time but which did not require any ex- 
tremely specific knowledge concerning the disposition of the individ- 
ual neurons of the system. Having defined such variables the next 
step was to show some of the necessary relations which they must 
bear to one another. 

In order to mathematically symbolize the problem and to develop 
a formulation it was necessary to define certain components of nerv- 
ous tissue in a highly schematic manner. These concepts are, of 
course, only approximations to actual nervous systems. Although 
the immediate application of such a formulation to the study of real 
nervous systems is tempered by the error incurred in its approxi- 
mations and the complexity of the mathematics which it implies, the 
questions which it raises and its usefulness as a guide for further 
research are the ultimate measures of its value. These depend upon 
the future. 

Let us briefly examine some of the ideas presented in this prob- 
abilistic approach. To begin with, those parameters which are asso- 
ciated with neurons as such (threshold, synaptic delay, refractory pe- 
riod, ete.) are assumed to vary from neuron to neuron according to 
some distribution function which, in general, would have to be deter- 
mined empirically. The shapes of the distribution functions of these 
various parameters are assumed, in general, to vary from region to 
region in the neural net. The patterns resulting from the various neu- 
rons synapsing upon one another are characterized not by the dis- 
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position of each neuron in the system, but rather by certain func- 
tions of point pairs in space. These functions denote the probability 
that a neuron in a macroscopically small region about one of the 
points receives an axone from a neuron in a macroscopically small 
region surrounding the second point. The system is thereby viewed 
as an aggregate of neurons characterized by continuous (or step- 
wise) distributions of “local” properties together with certain “ten- 
dencies” of connection from region to region. A single neuron ac- 
cording to this picture, or even a small group of neurons, is not, in 
general, a determining factor in the “behavior” of the net. This 
seems more in accordance with the behavior of actual nervous sys- 
tems wherein even widespread lesions often do not seriously impair 
the performance of the tissue as a whole. 

Some mention should be made of the kinds of problems which 
these methods imply and attempt to deal with. They are essentially 
of three broad categories. 


J. THE DYNAMICS OF NEURON INTERACTION. 


This category of problems has already been briefly discussed in 
the foregoing. Several kinds of ‘“sub-problems” have been ap- 
proached from this point of view. 


A. Input-Output Relations in Nerve-Ganglion Systems. Solu- 
tions for some special cases of this general problem have appeared 
in the literature (Shimbel, 1949; Rapoport, 1950a). Briefly the 
problem is as follows. Given an aggregate of, say, N neurons (see 
Figure 1) which is characterized by the kinds of parameters dis- 


INCOMING GANGLION OUTGOING 

BUNDLE N CELL BUNDLE 

N FIBERS BODIES N FIBERS 
FIGURE 1 


cussed in the foregoing and given also the frequency of action po- 
tentials occurring in the incoming bundle per axone per unit time 
(Le., the input), determine the frequency of action potentials occur- 
ring in the outgoing bundle per axone per unit time (i.e., the output) 


as a function of the distributions of the various parameters in the 
ganglion. 


B. The Converse Problem. Given a certain general relation 
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between the output and the input (not necessarily a specified func- 
tional relation) such as the existence of a maximum or the magnitude 
of the output as compared to the input at certain critical points, what 
relations (for example, inequalities) must be satisfied by the para- 
meters for the desired relations to hold? 


C. Time Series. If the input is considered as a time series 
(Shimbel, 1949) what distortion effects will appear as a result of a 
given distribution of parameters in the ganglion? 


D. The Converse Problem. If the ganglion is considered as a 
transformation on a time series, what distributions of parameters 
will do a given job of transformation? 


Having examined some of the properties of ganglia as to their 
effects on input-output relations we may perhaps be able to inter- 
pret more complex patterns of activity in terms of interactions be- 
tween such “ganglion-elements.” 

In particular, the class of problems that has been designated 
as homeostatic problems may possibly be viewed in these terms. 


Il. “HYSTERESIS” IN NEURAL NETS. 


There are a number of well-defined problems which involve 
changes in the structure (and, consequently, the function) of neural 
networks as a result of the activity which occurs in them. Two ques- 
tions arise in connection with such problems. 

A. By what mechanisms does the activity of a neural net re- 
sult in changes of its structure? 

B. How, in general, do the changes in the structure of a neural 
net affect its dynamics? 

The problem of learning evidently constitutes a special category 
of such problems, and perhaps its most important category. Other 
such phenomena, for example, fatigue and hyperexcitability may also 
be included. 


Ill. THE EVOLUTION OF NERVOUS SYSTEMS. 


The problem of evolution of nervous systems as it appears from 
the point of view implicit in neural net theory is still rather poorly 
defined. Some ideas on the question, though still quite vague and 
speculative, have become themes for conversation among workers 
in the field, although very few have become sufficiently well-formu- 
lated for publication. (See, however, Rapoport, 1950b.) A few quali- 
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tative remarks concerning this question have also been included in 
this paper. 


“MOLECULAR” THEORIES OF LEARNING 


In many ways psychology and, in particular, learning theory 
bear a relationship to neural net theory which strongly suggests the 
relation which thermodynamics bears to kinetic theory. This com- 
parison of psychology to thermodynamics and neural net theory to 
kinetic theory, if it serves no other purpose, at least expresses the 
hope that the “high order” concepts of psychology are essentially 
reducible to the language of communication theory. In what follows 
we will examine some of these reductionist attempts with respect to 
learning theory and briefly outline some possible alternatives. 


U.S.-UNCONDITIONED 


C.S-CONDITIONED 
STIMULUS 
(Gas. 
R-RESPONSE 
FIGURE 2 


It has become standard practice to illustrate the principles of 
learning in “higher” animals by means of diagrams such as that shown 
in Figure 2. U.S. represents an unconditioned stimulus which invari- 
ably leads to a certain response R. C.S. represents a stimulus which 
before the learning process did not elicit the response R , but which 
thereafter is able to do so. The arrows, as used in the older psychologi- 
cal literature, chiefly served the purpose of symbolizing the above- 
mentioned relations among U.S., C.S., and R. 

In the course of time, along with the development of neuro- 
physiology and neural anatomy, the arrows in such diagrams began 
to take on a more definite “neural”? meaning. Quite commonly now 
they are said to represent “neural pathways” or chains of neurons. 

A great deal of anatomical and neurophysiological evidence has 
been accumulated which supports such a neural interpretation of 
“learning” diagrams. The details of such evidence are available in 
fy modern textbook of neurophysiology and need not be presented 

ere. 

Let us start then with the assumption that a certain organism 
is capable of “detecting” a specific stimulus (call it U.S.), and that 
the “information” so detected is transmitted via a chain of neurons 
or a bundle of such chains (which, in general, may also have inter- 
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connections) to certain motor organs which then respond to this 
information (call this response R). Let us assume furthermore that 
this organism is capable of detecting a different stimulus (call it 
C.S.) which does not evoke the response given to U.S. Finally, we 
will assume that after these two stimuli are simultaneously or “near- 
ly” simultaneously presented to the organism for a certain number 
of “trials,” the stimulus C.S. alone becomes sufficient to evoke R and 
the information is again neurally transmitted. All of this is, of 
course, nothing more than a typical conditioning phenomenon. 

How can the appearance of this “new pathway” be accounted 
for? Many attempts have been made to give this question a satis- 
factory answer. It has been suggested, for example, that the neu- 
rons of the C.S. bundle “grow” axones which synapse upon the neu- 
rons in the pathway from U.S. to R. This is the so-called theory of 
“neurobiotaxis.”’ No such growth has ever been observed. 

Another suggestion was that the necessary connections were 
already anatomically present and were absent only in a functional 
sense. The conditioning process, according to this point of view, 
serves merely to make these non-functional pathways operative. 

If this is so, then we can ask: What is the mechanism by means 
of which such ‘non-functional’ neural pathways can become opera- 
tive through experience? 


Reverberating Circuits. 


_N. Rashevsky (1938) described an interesting mechanism 
which accounts for at least the main features of this phenomenon. 
The simplest form of this mechanism is illustrated in Figure 3. Its 


C.S. 
FIGURE 3 


function is essentially as follows. Assume that the thresholds of all 
the neurons are equal to two bulbs, that is, at least two end bulbs 
terminating on a cell body must be “simultaneously” active in order 
to “fire” the neuron in question. It follows that each time U.S. occurs 
the large neuron leading to R will fire because the neuron stimulated 
by U.S. has two bulbs terminating upon it. Note that the axone of 
the neuron stimulated by U.S. also has a branch which leads to a 
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closed circuit of two mutually synapsed neurons but that this branch 
ends upon a neuron in the closed circuit with only one end bulb. 
Therefore, it is not capable of stimulating this neuron since all 
thresholds are assumed to be two bulbs. Note also that if C.S. occurs 
alone it will not fire either the neuron leading to R or the neuron in 
the closed circuit. Here again the insufficiency is due to the fact that 
the axone terminates on these neurons with only one bulb each, i.e., 
one bulb less than threshold. 

Finally, observe that if U.S. and C.S. occur simultaneously (i.e., 
within the period of latent addition), then two things will happen. 


1. The large neuron leading to R will be excited. 


2. The effects of the neuron leading from U.S. will sum with 
the effect of that leading from C.S., and as a result the closed circuit 
neuron will be excited. 


Note that the closed circuit once stimulated will continue in- 
definitely. (Thus the name reverberating circuit.) Now if C.S. oc- 
curs alone, its effect on the neuron leading to R will sum with the 
effect of the branches of the closed circuit on the neuron leading to 
FR, so that C.S. will now alone be a sufficient stimulus for R. 

This, then, in its barest essentials is the principle of Rashevsky’s 
model. Actually it has been much elaborated and refined and rather 
successfully applied to many aspects of learning theory. Further de- 
tails can be found in Rashevsky’s Mathematical Biophysics. 


Synaptic “Resistance” Theories. 


The so-called ‘‘synaptic resistance” theory is a possible alterna- 
tive to Rashevsky’s model. This theory is based on the conjecture 
that the frequent passage of impulses across a synapse tends to low- 
er its threshold and thereby make the passage of an impulse “easier.” 
Figure 4 illustrates how this idea can be adapted to the problem of 


conditioning. 
U.S. aye R 


C.S. Phe 4 


In this model, as before, all thresholds are assumed initially to 
be two bulbs. Clearly, then, U.S. is capable of eliciting R. However, 


FIGURE 4 
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C.S. cannot excite the R neuron directly and has only one bulb ter- 
minating on the internuncial labeled I so that it has no evident effect 
on the system. However, note that if C.S. and U.S. are presented 
together, then not only will R fire but also I will be induced to fire 
by the combined action of the two stimuli. The theory implies that 
if C.S. and U.S. are presented together frequently enough (i.e., if I 
fires frequently enough), the threshold of I will be lowered so that 
eventually C.S. will become a sufficient stimulus for R. 


A Simpler Model. 


There are, of course, many other so-called “molecular” theories 
of learning. Any standard textbook of physiological psychology con- 
tains brief reviews of the more prominent ones. In what follows, 
however, we will be concerned with what may perhaps be viewed as 
a simpler version of the model just discussed. The first model to be 
considered is depicted in Figure 5. 


we ee Li 


CrS- 


R 


FIGURE 5 


Here again, if we take the threshold of the neuron leading to R 
as two bulbs, then it is seen that U.S. is a sufficient stimulus for R. 
Similarly, it is clear that at the outset C.S. is inadequate. Now it is 
‘assumed that if U.S. and C.S. are frequently presented in temporal 
contiguity, eventually the threshold of the R neuron will fall below 
“normal” until finally C.S. will become a sufficient stimulus for R. 
It must be made clear that U.S. is assumed to deliver a superthresh- 
old impulse, whereas C.S. is assumed to be subthreshold. 

Our hypothesis explicitly requires that the threshold of a neuron 
will be lowered if a super-threshold and a subthreshold stimulus im- 
pinge upon it in sufficiently close temporal contiguity. 

It should be admitted that all of the mechanisms suggested de- 
pend to a greater or lesser extent on ad hoc arguments. In fact, one 
“nays” for the simplification of the model by introducing new ad hoc 
hypotheses. This should, however, not in itself invalidate any at- 
tempt to develop a theory of learning from a particular chosen level. 

An important question which arises at this point is: ‘What is 
the mechanism responsible for this decrease in threshold? Variations 
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in threshold owing to various causes and under various conditions 
have, of course, been studied in some detail. Rapid and transient 
changes of threshold are consistently associated with active neural 
tissue. More lasting and more drastic threshold variations can 
readily be induced by a great variety of chemical agents. All of these 
observations tend at least to make plausible the assumption concern- 
ing threshold variation which is required for this molecular model 
of learning. One of the advantages of Rashevsky’s model is that it 
does not require any such additional hypothesis but stems directly 
from the more usually considered properties of neurons. Be this as 
it may, we will assume in what follows that such threshold variations 
do occur and that, therefore, a learning theory based upon such phe- 
nomena is possible. It is suggested as an alternative to the Rashev- 
sky picture because it seems at least superficially to require simpler 
neural circuits and, in general, fewer neurons. 

The model as described so far is hardly more than a suggestion. 
It would appear natural at this point to examine in detail those as- 
pects of neural metabolism most intimately associated with threshold 
variations and to attempt to formulate a physicochemical theory of 
such changes which may then be checked by experiment. The results 
so obtained may then more legitimately be compared to those derived 
from the Rashevsky model. In this paper, however, a rather differ- 
ent avenue of development will be attempted. We will be concerned 
in what follows with the general question: Given that learning is 
characterized by more or less permanent threshold changes such as 
those described above, how much specificity of growth is required in 
the ontogeny of a nervous system in order that it possess potential- 
ities for learning? 

The proportions and generality of such a question make any 
attempt to answer it in one fell swoop, or even to give it rigorous 
meaning, an almost hopeless task. It can, however, serve, so to speak, 
as a “definition” of an area of problems to be considered. Proceed- 
ing, then, in this spirit, we will pose and attempt to analyze some 
extremely simplified and hypothetical versions of the problem with 
the intention of using any results obtained thereby as an aid to the 
solution of progressively more realistic problems. 


THE GANGLION-BRAIN 


Thus the first problem to be considered is the following. Assume 
that in some organism there exists a “central ganglion,” i.e., a sort 
of primitive brain which is nothing more than a collection of, say, N 
neurons which synapse upon one another in a completely random 
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fashion. By randomness here we mean the following. In the course 
‘of the formation of the ganglion (i.e., when the neuroblasts are grow- 
ing axones) there is no preferential synapsing. An axone growing 
out of a particular cell body is equally likely to synapse upon one 
other cell body as upon another. Operationally this may be described 
as follows. Let the neuroblasts be numbered in some way prior to 
the formation of the axones. After the axones have grown and the 
synapsing has occurred, let the number of each neuron be associated 
with the numbers of all the neurons to which it is directly connected. 
If no correlation can be established between the number of the neu- 
ron and the set of neurons to which it is connected, the connections 
will be said to be random. Assume, furthermore, that the afferent 
receptor tracts enter this aggregate and synapse randomly on the cell 
bodies within it. By “randomly” we again mean “without discernible 
order.” In other words, the neurons upon which the efferent axones 
synapse are in no way distinguished (e.g. by their location) from 
each other. Finally, assume that some of the axones given off by the 
cells in the ganglion proceed peripherally to innervate the motor or- 
gans. (See Figure 6.) 


FIGURE 6 


Now suppose we examine the neural pathway which is followed 
by an unconditioned stimulus, from its detection at some receptor to 
the motor organs which eventually respond to it. Such a stimulus 
will follow a chain or bundle of afferent fibers, which lead into the 
central ganglion, wherein, through a number of internuncials, it will 
eventually arrive at the efferent pathways which lead out of the gan- 
glion toward the organs which give the response. For simplicity, let 
us assume that the afferent and efferent pathways are not bundles of 
neural chains, but simply one such chain of serially connected neu- 
rons joined by asingle internuncial within the ganglion. 

Let us now consider any other receptor pathway (again a sim- 
ple chain) leading into the ganglion and there synapsing randomly 
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upon one of the ganglion neurons. Given that the learning mechan- 
ism is a threshold phenomenon as described above, can the motor or- 
gan which unconditionally responds to the stimulation of the first 
pathway be conditioned to respond to the stimulation of the second? 
An unambiguous answer to this question can be given only if we 
know whether or not the second receptor chain eventually (i.e., 
through a series of internuncials) connects to at least one of the neu- 
rons in the first chain. Since the connections in the ganglion are as- 
sumed to be random, this question cannot be answered by a yes or no 
but only by a probability statement. Rephrasing the question, we 
ask: Given any two neurons in such a randomly connected ganglion, 
what is the probability that the first neuron will, through any num- 
ber of internuncials, be connected to the second? This problem has 
been solved for a special case by A. Rapoport (1948). It is supposed 
there that every neuron in the ganglion has exactly one axone which 
synapses with equal probability on any one of the neurons in the 
ganglion. It is shown that under these assumptions the probability 
C that an arbitrarily selected neuron will be a member of a cycle is 
given by the equation 


C=/2/2Ns (1) 


where N is the number of neurons in the ganglion. This is exactly 
the probability we are seeking, since the probability of starting the 
chain at one neuron and ending it on the same neuron is the same 
as Starting the chain at one neuron and ending it on any other speci- 
fied neuron. 

To be sure, this probability is small for large N. If, however, 
we suppose that not one but several axones emanate from each neu- 
ron in the ganglion, then the probability of an arbitrary pair being 
connected rises rapidly with the number of axones. 

To view it in another way we can consider the first neuron of 
the arbitrary pair as an analogue to a person infected with a con- 
tagious disease living in a closed and healthy population. The total 
number of persons who ultimately succumb to the disease in the re- 
sulting epidemic will presently be shown as analogous to the total 
number of neurons ultimately connected to the first. The ratio, then, 
of this number to the total number of neurons in the ganglion will 


be the probability that there exists a path of internuncials leading 
from the first to the second. 
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Analogy between the Contagion Equation and Probability of 
Path Existence. 


Suppose first that the disease is not lethal; then every infected 
individual continues to live and infect others. The rate of infection 
in a population of individuals will then be governed by the follow- 
ing differential equation 


dx ‘ia 

—_=— kx (a1— 

A x x), (2) 
where x is the total number of infected individuals (and, therefore, 
(n—x) is the total number of healthy individuals) and k is a time 
constant governing the rate of spread of the disease. It is easy to 
see that ultimately every individual in the population will become in- 
fected. 

In fact, the number of individuals which has become infected is 
given by the well-known solution of the “logistic equation” (2) 


where A — x«(0)/[m — x(0) ]. 

Translating this situation into terms of neurons, axones, and 
connections, this would mean that a single neuron begins the “‘in- 
fection” by giving off axones at a constant rate and each neuron 
which receives one of these axones begins, in turn, to grow axones at 
the same rate affecting still other neurons in the same way. Clearly, 
then, any neuron which at any time is thereby brought into this sys- 
tem will be accessible through a pathway leading from the instigat- 
ing neuron. As in the case of contagion, every neuron will ultimately 
be brought into the system. Note, however, that the actual situation 
in our ganglion is somewhat different. The difference lies in the fact 
that in our ganglion each neuron gives off only a limited number of 
axones, whereas the contagion equation implies that each neuron con- 
tinues to give off axones indefinitely. This difficulty can be remedied 
by a proper modification of the contagion equation. This modifica- 
tion can be considered, in the language of epidemic, as the introduc- 
tion of a death rate with a constant time lag between infection and 
death. The death of an individual is then analogous to the exhaus- 
tion of any one neuron’s “supply” of axones. Just as the former in- 
dividual stops infecting others, so does the latter stop making new 


connections. 
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We introduce the following variables in the modified contagion 
equation: 


x(t) = Number of individuals sick or dead at time ¢, 
y(t) = Number of individuals dead at time ¢, 

n = Total number of individuals in original population, 
+ = Time lag between infection and death, 

k = Time constant. 


In this notation we then have 
dx/dt=k(x—y) (n—2%), (3) 


Ofor vas, 


(4) 
Yo— t 
e—| wat (crea ee 
t-1 


whence we obtain y(t) = a(t — r), for t > 7. Substituting this into 
(3), we obtain 


& = hla (t) — ait 
Sp la (t) —2(t—2) 1 (n—2). (5) 


The problem now reduces to the solution of the following sys- 
tem: 


Loy 0<t< 
oS 0 (1) eres bay 


(6) 


da ; 
spe eet) —a(t—7r)] (n— x) ’ £.>. ¥ 
The system (6) can also be expressed by equation (5) alone with the 
initial condition on the function x(t), namely 


2(zZ) =0-4 forse 0% 


The solution of this system behaves for t < 7 in exactly the same 
way as the solution of the ordinary contagion equation (2). How- 
ever, increasing + corresponds in the neuron-axone language to a 
large number of axones per neuron, and so, as one would expect, for 
a sufficiently large number of such axones, the probability of any 
specified connection approaches unity. 


Learning as Simple Conditioning. 
Returning now to the model depicted in Figure 6, it should be 
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pointed out that the assurance of a ganglion with sufficiently dense 
connections does not necessarily imply an adequate learning mechan- 
ism. Although in such a model any response can be conditioned to 
almost any stimulus, a serious difficulty arises in that the lowering of 
threshold concomitant with a particular learning experience will, in 
general, involve many other afferent pathways. In other words, a re- 
sponse may be conditioned to many different stimuli without ever 
having been associated with them. Such conditioning we shall call 
“redundant conditioning.” 

To be sure, the phenomena of abstraction and generalization in- 
volve just such processes and can conceivably be explained by such 
“redundant conditioning.” However, on a different level the ‘ab- 
stractions” are the “stimuli” to be differentiated. Here again one 
may argue that a higher generalization process may take place, per- 
haps again by the mechanism of redundant conditioning. It is clear, 
however, that for any given level of abstraction, an effective nervous 
system will provide for a certain optimum specificity. Thus, for a 
given level of abstraction, an “optimum net” represents some sort of 
balance between spread of response and specificity. Whatever that 
optimum may be, it is clear that “total redundancy,” i.e. where all 
conditionable pathways are conditioned together, is hardly desirable. 
In such a situation, “everything is everything else’ for the organ- 
ism and discrimination is impossible. 

This difficulty can be overcome by a generalization of the model. 
Assume now that the unconditioned stimulus-response pathway con- 
sists not of a single neural chain but makes up a bundle of s such 
parallel chains, each of which has one neuron in the ganglion. This 
model is illustrated in Figure 7. 

The other afferent fibers constituting the pathways for other 
stimuli also arrive as tracts rather than single neural chains. These 


FIGURE 7 ee {tet 
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then synapse randomly on the cell bodies of the ganglion. At this 
point a somewhat different concept of threshold is introduced. In 
order for a given response to occur in the motor organ we require 
that a certain minimal number, h < s, of the fibers (single neural 
chains) innervating it must be active. Clearly this condition is al- 
ways satisfied for a sufficiently strong stimulus traversing the un- 
conditioned pathway. However, it is possible for this response to be 
conditioned to any other stimulus only if at least h distinct efferent 
pathways are accessible to the stimulus in question. As has been 
shown above, this condition will prevail when the interconnections 
of the ganglion neurons are sufficiently dense. Here again, since we 
assume that the interconnections in the ganglion (other than the un- 
conditioned pathway) are random, we can speak only of the prob- 
ability that for any given case the requirement for the possibility of 
conditioning be met. This probability can be expressed in terms of 
s,h andy, the probability of a single efferent pathway being acces- 
sible to the afferent tract. In fact, this is simply the expression for 
the probability of at least h successes in s independent trials, each 
with a constant probability ». The probability C, of exactly h suc- 
cesses in s trials is given by (Uspensky, 1937) 


1 Oda (q—») ( — 3¢) 
C,= @ 8 | Lge ee es (7) 
V 2nspq 6 spq 
where t = h — sp/\/spq and 
3 
Piece ae el ee ee . 


(spq) *”? 


provided spq > 25. 


Thus the probability C of at least h “successes” is given by the 
expression 


C(h,s,p) =X C;. (8) 
j=h 

If h, s, and p are considered as parameters of the ganglion, 
then C(h,s,~) is also a parameter of the ganglion. We shall refer 
to it as the “conditioning potential.” It is a measure of the number 
of pathways of the ganglion which can be conditioned to the uncon- 
ditioned stimulus-response pathway. Evidently, then, organisms with 
a high conditioning potential will have a greater potential reper- 

toire of learned responses. 
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However, as we have seen, the usefulness of a learning mechan- 
ism depends not only upon the high probability of effective connec- 
tions but also upon the non-interference of the various afferent tracts 
with one another. In our present model it may appear that as in the 
previous case (single chain afferents), the high accessibility of effer- 
ent pathways to all stimuli also implies the high probability of “re- 
dundant” conditioning. In order to illustrate how redundant condi- 
tioning can be avoided by our generalized model while high accessi- 
bility of efferent pathways to the afferent tracts is still maintained, 
we will refer to Figure 8. Note that the pathways from S, and S. to 
R are drawn as direct connections. This need not be the case. In 
general, these connections occur through internuncials. 


Here, as usual, the afferent fibers which carry the unconditioned 
stimulus U.S. synapse on an equal number of cell bodies in the gan- 
glion which then send axones (efferent) to the outgoing bundle which 
stimulates R. For purposes of illustration, let us arbitrarily take 
h=83 ands=6. A glance at Figure 8 shows that both S, and S, 
satisfy the conditions for potential conditioning to R; i.e., both S, 
and S. have h = 3 common paths with U.S. Nevertheless the condi- 
tioning of R to S; does not imply the conditioning of F to S,. This 
is so because less than h of the paths which S, and S, each have in 
common with U.S. are overlapping paths. In other words, those neu- 
rons whose thresholds are lowered as a result of conditioning R to Si 
are not all the same neurons whose thresholds must be lowered in 


order to condition F to S2. 
Of course, as # approaches s the probability of “redundant” 
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learning increases. Also, as p approaches unity the probability of 
redundant learning increases. 

Now we seek an expression for the probability of redundant 
conditioning in a net such as that illustrated in Figure 8. That is to 
say, suppose that two afferent tracts S, and S, each have a sufficient 
number of paths in common with the ganglionic internuncials of 
U.S. leading to R so that R can be conditioned to either of them. Sup- 
pose further that the numbers of common paths S, and S, have with 
U.S. respectively are a andb,a > h, b 2h. Weask: What is the 
probability that h or more of these pathways are common to S, and 
So? 

This probability can be computed in terms of the solution to the 
folowing mathematically equivalent problem. Suppose an urn con- 
tains m white and n black balls. If one removes 2 balls from the urn, 
what is the probability, P, that u of the 2 balls will be white and » 
will be black? 

The analogy between this problem and ours can be seen as fol- 
lows. The number, a, of common paths between S, and U.S. corre- 
sponds to the m white balls in the urn. The (s — a) remaining paths 
of U.S. correspond to the n black balls. The number, b, of common 
paths between S, and U.S. corresponds to A, the total number of balls 
removed from the urn. We now ask: How many of the b paths com- 
mon to S, and U.S. are “white,” i.e., also common to S, and U.S., and 
how many are “black,” i.e., not common to S, and U.S.? 

The answer to this problem of white and black balls was given 
by A. Markoff (1913). It is 


(uty) lm (m—1)----(m—p+1) n(n—1)----(n—v+1) 
uly! (m+n) (m+n—1)--+-(m+n—u—v+1) 


(9) 


Translating this expression into our notation we obtain the fol- 
lowing for the probability, P(a,b,r), of r “redundant” connections 
between two afferent tracts of respective numbers of “conditionable” 
paths, a and b, 


P(a,b,r) 
=e b!a(a—1)----(a—r+1) (s—a) (s—a—1) (s—a—b+r+1) 
71(b—) !s(s—1)----(s—b4+1) 


The probability that the redundancy r exceeds threshold h is 
given by the expression 


P(a,b) =3 P(a,b,r) , (11) 


(10) 
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where 
¢=min(a,b). (12) 


ne From the quantity P(a, b) and the components C; of the con- 
ditioning potential it is possible to write an expression for the “re- 
dundancy potential’ of the ganglion. This is given by 


R(s,h,p) =a > CC; Pt, 7). (13) 
gah t=h 
The redundancy potential is, therefore, simply the sum of the P’s 
properly weighted by the C’s which denote the probability of their 
occurrence. 
Thus C measures the number of things which can be learned, 
while ‘R measures the interference between the various learned re- 
sponses. 


Implications of the Model. 


So far it has been shown that a collection of neurons having 
only certain assumed properties of threshold variation and possess- 
ing only a minimum of specificity in its connections (the uncondi- 
tioned tracts are assumed to lead directly to motor responses) can be 
connected in a completely random fashion and nevertheless possess 
simple properties of learning. 

We will now examine some of the implications of this model 
which have not yet been mentioned and some of the problems which 
they suggest. 


The Optimal Net. 


As was mentioned in the foregoing [see expression (8) ], the prob- 
ability that a given afferent tract has enough paths in common with 
any specified unconditioned pathway (what we have called the con- 
ditioning potential) depends upon three variables, namely, p, s, and 
h. As the probability », that any one chain of the unconditioned 
tract is linked with the afferent tract, rises, so also the conditioning 
potential C rises. But as p approaches unity, a and b approach s. 
Under these circumstances the probability that the redundancy RF be- 
tween any two afferent tracts exceeds h approaches unity. 

In other words, if we wish to guarantee that every unconditioned 
tract has a sufficient number of paths in common with every other 
afferent tract, redundancy of connections exceeding threshold will 
become a certainty. In the extreme case we would have an animal 
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which could associate any two stimuli with the same response but 
once having done so for any two stimuli would automatically have 
associated every other possible stimulus with that same response. An 
animal possessing a “brain” near this extreme would readily learn 
but would equally readily “confuse” what it had learned. It would 
quickly reach a “saturation” point where it could, literally, not dis- 
tinguish any more new “information.” 

It is seen then that conditioning potential and redundancy are 
in a sense antagonistic to one another. It may perhaps be possible 
to find a “natural” definition for some kind of optimum balance be- 
tween these two tendencies. This would lead to a well-posed mathe- 
matical problem of determining the relations between the variables 
which must prevail in order for the “ganglion-brain” to achieve this 
optimum balance. 


A possible suggestion is the following. Since GC measures the 
probability that a given stimulus has adequate connections to a given 
response, the quantity 1— C is the probability of failure. The quan- 
tity 6(1 — C) would be a measure of the “loss” which is incurred by 
the organism because of this inadequacy. The constant 6 would of 
course depend upon the particular environment which the organism 
inhabits. If now we assume that maximum specificity of response is 
optimal then the quantity yR will also be a measure of a certain 
“loss” to the organism. The constant y is again determined by the 
environment. The total “loss” to the organism would then be given 
by the expression 


L=B(—G) + »R. (14) 


The construction of the “optimum net” would then involve the 
minimization of expression (14). This in turn imposes conditions 
on the parameters of the net upon which both C and R ultimately 
depend. 

This definition of “optimum net” is obviously a very special one, 
since it assumes that maximum specificity is the most desirable. More 
generally this need not be the case. The general theory of optimal 
nets leads to a separate set of problems. 


Specific Learning Abilities. 


Suppose we would modify the connections in the ganglion so that 
instead of being completely random, certain regions would have 
axones which tend to grow in some given direction. We would evi- 
dently find that such an animal’s ability to associate certain stimuli 
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with a given response would be exceptionally high as compared with 
other stimuli. If a population of such animals possessing different 
“biases” in the direction of axones in their “ganglion-brains” were 
tested for their ability to learn certain simple patterns we would find 
that certain members of the population would be especially “intelli- 
gent” with respect to any specified pattern. On the other hand, if we 
compared the learning ability of the “intelligent-for-pattern no. 1” 
group with that of the “stupid-for-pattern no. 1” group with respect 
to pattern no. 2, we would have no a priori reason, judging from our 
model, to assume that the “intelligent” group would retain its su- 
periority for the second pattern. This is reminiscent of R. C. Tryon’s 
work (1931) in the selection of rats which were exceptionally ca- 
pable of solving certain mazes. When tested with new mazes the ex- 
ceptional rats showed only ordinary ability. 


Relation to Evolution. 


Suppose that some population of animals possessing such gan- 
glion-brains with various hereditary growth “biases” is living in a 
more or less constant environment. In general, we would expect that 
the ability to make certain simple ‘‘associations” would increase the 
likelihood of its survival. Furthermore, any mutants which enhance 
such growth “tendencies” without significantly changing the other 
features of the species would also be selected for. Such a selection 
process, it would seem, should eventually lead to some rather specific 
neural orientation especially adapted to the enhancement of the “val- 
uable”’ association. In the process of such evolution we might expect 
that eventually the members of the population would have such a 
great density of “right”? connections that learning might be simply 
a matter of a few trials. At this point slight hereditary changes in 
the neural thresholds would make the response “instinctive.” 

These considerations may indicate the way to answer the per- 
plexing question concerning the evolution of complex “instinctive” 
behavior patterns. It has always been difficult to imagine that such 
patterns arose through gradual synthesis of simpler “units of be- 
havior,” because the simpler units by themselves did not appear to 
have any survival value. It was hard to conceive that the mud wasp 
had the habit of flying about with a ball of mud for a million years 
until it finally decided to use it in house building. Why should a habit 
of doubtful survival value have been retained so long? 

The implications of the ganglion-brain random net model view 
the evolution of instinctive behavior as derived from more and more 
easily learned patterns, which become gradually more intimately 
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Organization versus Chaos. 


Notice that in this primitive organism which evolved a simple 
innate behavior pattern, a portion of its random neural tissue has 
been “sacrificed.” As more and more such innate patterns evolved, 
progressively more random tissue would become organized. The new- 
ly acquired innate behavior patterns are, of course, assumed to be 
an “advantage” to the species, but, on the other hand, the ability 
to learn new responses (which ability depends, according to our 
model, upon random tissue) is presumably also of importance. We 
might expect then that in the course of its evolution the species would 
“attempt to compensate” for this loss by increasing the size of its 
“ganglion-brain” and thereby replenish some of its random tissue. 
Notice also that the geometry of such development would imply that 
the more highly channelized and older circuits would find themselves 
more and more deeply imbedded in the less specialized tissue. There 
is some evidence to indicate that this is actually the case. 


PROGRESSIVE REFINEMENT OF RESPONSE IN LEARNING 


The kind of learning so far discussed is of a rather simple nature 
in which a learned response is either categorically given or not given. 
In reality most responses are graded. Learning is usually not a ques- 
tion of yes or no, but, rather, how much. In any coordinated activity 
an optimal response to a given stimulus involves not merely the com- 
bined activity of a group of motor organs but, in general, requires 
that each element in the combination respond with a certain “opti- 
mal” intensity. How can such adjustments of graded responses be 
linked with the learning processes? 

We will describe an exceedingly simple model which may account 
for such phenomena. The model lends itself to rather obvious gen- 
eralizations. For greater clarity the simple version will be discussed 
in detail. The possible generalizations will then be indicated. 

In the model which we are about to consider the grading of a 
response will be represented by a spatial coordinate. This procedure 
will not be unfamiliar to those acquainted with the “spatializing” of 
graded responses in physiological psychology as, for example, in the 
theories of hearing. In our model, therefore, the ‘narrowing down” 
of a response toward an “optimum” will be represented as a progres- 
sive localization. This “localization” is actually an artifact of our 
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representation and need not imply geographic localization in the nerv- 
ous system. The model is illustrated by Figure 9. 


——> RESPONSES 


FIGURE 9 


Let S represent some stimulus (for simplicity of discussion let 
us consider it to be a noxious stimulus) whose afferents innervate an 
aggregate of neurons which in Figure 9 is represented by a coordi- 
nate «. The instantaneous thresholds h; of the neurons in the array 
are assumed to fluctuate about some mean value h. This quantity is 
a function both of x and, as will be shown below, implicitly of time. 
Thus 


h=h(x,t). (15) 


Future references to the “threshold” will imply this mean value 
h. We shall refer to h(x,0) as the initial distribution of thresholds 


f(x). For any given subthreshold stimulus intensity we should ex- 
pect that the instantaneous threshold h; will, owing to its fluctua- 
tions, occasionally fall sufficiently low to permit the neurons asso- 
ciated with it to fire. However, if h is too large at certain points on 
the linear array and the amplitude of the fluctuations never exceeds 
a certain maximal amplitude, then the neurons associated with these 
points would, for certain stimulus levels, be effectively non-func- 
tional. 

Now to proceed with the description of our model, we will as- 
sume that the firing of any neuron in the array results in motor ac- . 
tivity which manipulates the environment in such a way as to in- 
crease or decrease the noxious stimulus S. This effect HE’ on S is not 
the same for all neurons in the array but, in fact, is a-function of 
the “position,” x, of the neuron in the array: E = E (2). 

An essential feature of the model is the so-called success center. 
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Here again the word center is used in a functional rather than ana- 
tomical sense. A certain level of activity is assumed always to be 
present in the center. The neurons in it are innervated by tracts 
leading from the receptors of the noxious stimulus S in such a way 
that its activity is inhibited in proportion to the intensity of S. Thus 
the decrease of the intensity of S tends to increase the level of ac- 
tivity in the center. This implies that the neurons in the linear array 
(by stimulating motor activity which in turn manipulates the en- 
vironment and thus affects S) thereby have a differential effect upon 
the activity in the success center. This differential effect may be 
represented by a function of “position” «, which may, in fact, be 
taken to be E(x). The activity of the success center lowers the 
thresholds of the linear array in accordance with the previously sug- 
gested mechanism, i.e., the amount of lowering of the threshold of a 
particular neuron depends not only upon the activity of the success 
center but also on the activity of the neuron. 

We will suppose that h(x) is lowered by an amount proportional 
to the number of times, N , the neuron at x has fired. (To be exact, 
N refers to the expected number.) In order to express N as a func- 
tion of the parameters of the model it will be necessary to introduce 
some notation. Let 7(&) denote the relative frequency distribution 
of the instantaneous threshold h per unit time. The probability per 
unit time that the instantaneous threshold h; is below the stimulus 
level S is then given by the expression 


f nae. (16) 


The expected number of times N that the neuron in question has 
fired up to time t will, therefore, be given by the relation 


N= fof n(erazar. (17) 


To express the lowering of the initial threshold of a neuron by 
the activity of the success center we have the equation 


h=h,—oEN , (18) 


where a is a constant and where h is taken to be identically zero for 
allaEN > h,.. This latter condition is necessitated by the considera- 


tion that negative h is physically meaningless. In other words, we 
suppose that the decrease in threshold is linear and proportional to 
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E', the effectiveness of the activity resulting from the firing of the 
neuron with respect to decreasing the intensity of the noxious stimu- 
lus, S, (and thus increasing the activity of the success center). It 
is also proportional to N , the number of times the neuron has fired 
(and thus had its threshold lowered by the activity of the success 
center). 


Introducing the expression for N [equation (17)] into equation 
(18) we obtain 


h=y— oH (p Jo nerazar. (19) 


If the distributions h,(x), E(x), and 7(&) were known we could 
theoretically solve for h in terms of x and t. The narrowing down of 
the response to the optimal one would then consist in the “sharpen- 
ing” of h(a,t), considered as a function of xz, as time progresses. In 
other words, learning would be reflected in the appearance of a mini- 
mum in the graph of h(x) and in the progressively narrowing “in- 
vagination” of the graph around that minimum. The appearance of 
more than one minimum would imply that alternative optimal re- 
sponses to the stimulus were possible. 

The sharpness of the graph at the minimum will imply that a 
certain neuron of the linear array is much more likely to fire as a 
response to the stimulus than any other neuron in the array. The 
net will have learned to respond in a specific way to the stimulus. 

In order to illustrate this phenomenon we can assume that the 
distributions appearing in equation (19) are of very simple form. 
In particular let us put h, = constant. Furthermore, let 7(&é) be a 
rectangular distribution of width 2A. This simply says that all pos- 
sible fluctuations occur with equal frequency. 

Note that according to these assumptions it may be that for 
some of the neurons in the array we would haveh > S +A. This 
would mean that such neurons would never be active. If the stimu- 
lus level were raised, however, or if the range of fluctuation A were 
increased, then such neurons would again become functional. For 
the sake of definiteness we will exclude such neurons from our con- 
sideration, i.e. we will assume in what follows that h is always 
smaller than S + A. Note also that if the mean value of the thresh- 
old of some neuron is lower than the stimulus level S, such a neu- 
ron would be firing almost continuously whenever S is acting. This 
may be considered analogous to an instinctive or reflexive response to 


266 CENTRAL NERVOUS SYSTEM 


a specific stimulus. We will also exclude these cases from our con- 
siderations and take our analysis to be meaningful only so long as 
(Bose ; 

Using these assumptions then, after a suitable choice of time 
units, equation (17) becomes 


N= fi [Si AL hee iiar (20) 


Now by letting J = S + A and substituting the expression for N into 
equation (19) we obtain 


t 
h=h— oH J [I —h] dz; (21) 
whence by differentiation 
dh = 
eS SSS, es j 2, 
di ak (I — h) (22) 
so that 
dh 
— =okdt. (23) 
(h—I) 
This leads to the solution 
log C(h—I) =aEt, (24) 


where C is the constant of integration. The evaluation of C by set- 
ting t = 0 yields 


C= (he—I)>; (25) 
therefore 
h rs I — eut (26) 
ho—1 
whence 


h(x,t) =I — [I —Ty(a)] exp [oH (x) t] , 
a 27 

I>h>I—A. Ce 

Now in order to obtain h as an explicit function of x and t we 

must stipulate the form of the distribution E(a#). Let us choose the 

origin of the linear array to be at the center, and the effectiveness, 

i , to be of the form E(x) = 1/(1 + a). We are interested in the 
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graph of h(x, t) as it changes with time. In particular, does a mini- 
mum appear, and does the second derivative at that minimum grow? 
These criteria are not sufficient to insure the “sharpness” of the 
graph at the minimum. We can, however, examine other criteria, for 
example, the coming together of the inflexion points on both sides of 
the minimum and the value of the first derivative at the inflexion 
points. Again we may measure the sharpness of the minimum by 
the standard deviation of the function h(x, t) considered as a dis- 
tribution. 

The origin of the linear array is taken to be at the center, and 
the effectiveness, H , is greatest there. Furthermore, let J and h, be 
constant throughout the array. We are interested in the graph of 
h(x, t) as it changes with time. In particular, does a minimum ap- 
pear, and does the second derivative at that minimum grow? These 
criteria are not sufficient to insure the sharpness of the graph at the 
minimum. We can, however, examine other criteria, for example, 
the coming together of the inflexion points on both sides of the mini- 
mum and the value of the first derivative at the inflexion points. We 
note that 


2x 


E' (x) ge ( easy: (28) 
Se eh oe) oat ta) e005 
ade (1 + 2?)4 SIE ee 


Both derivatives (28) and (29) are non-positive in the neighbor- 
hood of x — 0. Furthermore, 
oh = 
—=— (I—h) a te E(x). (30) 
ox 
This vanishes only where E’(x) =0, that is, at «= 0. For the sec- 
ond derivative, we have 
oh ; | ee poslaven: 
aie (I — ho) a2t?e%#* (E’)? — (I— hy) ate™'E" . (31) 
G 
This is positive at s = 0. Thus the existence of a unique minimum 
is assured. From equation (81), we see that 0?h/du? does increase 
with time at the minimum; in fact, it increases rather rapidly since 
it is proportional to te°*’. 
To obtain the behavior of the inflexion points, we set oh/ox?=0, 
whence 
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at4n? 6x4? — 2 
=: \2__ A” = ——__ 4 —__—_—__= 0, oe 
oe En Latan( a: ame Cae (32) 
which, after simplification and rearrangement, yields 
6at + 4a?(at + 1) —2=0. (33) 


The roots of equation (33) give the positions of the inflexion points 
in the graph of h(x) for each ¢. By the symmetry of h(a), they are 
equidistant from the origin, and the distance between them is meas- 
ured simply by the coordinate of either one of them. But the solution 
of (33) for x? gives for the significant (positive) root 


(c*)?=4 --at—1+ Vv (at+1)?73]. (34) 


We seek the behavior of («*)? as time increases. Its derivative with 
respect to time is given by 
d[(x*)?]> a 2 
SS ee (35) 
dt 3 \/ (atl) Be 


We wish to know whether the right side of (35) is negative, i.e., 
whether the points of inflexion are approaching each other. This will 
be so if 


2 
(7 ame S terra (36) 
V (at Fi)? +3 
2< V (at +1)? +3, (37) 
or 
ON 6 oF ie eg A a? Ba (38) 


But inequality (88) holds for all t > 0. Hence the inflexion point 
criterion is satisfied. 

We shall now compute the slope of h(a) at the inflexion points 
and inquire whether it increases with time. Substituting the expres- 
sions for E(«) and E’(a) given by equations (23) and (24) into 
(26) we obtain 


1 


oh siel 2% 
ox Ugahe) ate (1+ a= , (39) 


Therefore, the slope of h(a) at the inflexion point « = x* is given by 
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oh 
ang (40) 


C= pe 


ee 
= (I—h,)ate *” | —————_ |. 
\g G+ (@))? 


The expression for a* is given by equation (84). One could, 
therefore, proceed to evaluate the slope at the inflexion point by sub- 
stituting the right-hand member of (34) into the right-hand mem- 
ber of (40). This, however, would lead to extremely unwieldy expres- 
sions. An alternative approach is to solve for at from equation (33) 
in terms of «* and to substitute the resulting expression into equa- 
tion (40). Our previous result on a* indicates that as ¢ increases, x* 
decreases. Therefore, to ask how ¢h/éa* behaves as t increases is 
equivalent to asking how it behaves as 2* decreases, if all we are in- 
terested in is direction of change. 

Solving equation (33) for at we obtain 


pela Or 


—_ 
: 2a t) 
Expression (40) becomes 
oh — 1—2(x*)?—3(a*)4 
Sh Sig ype 
ee er C1 Fek 7) 2) 
x Exp | | i) 
Ata)? (da (2")? | 


Note that since at and x are always positive in the context of 
our discussion, equation (41) implies that 1 — 2(a*)? — 3(a*)* > 0 
for all ¢. This, in turn, implies that there exists an upper bound on 
x*. It may appear that this upper bound does not depend upon the 
units of «. However, the units of x were fixed by the expression 
for E(x). 

It can be seen by inspection of equation (42) that both the 
rational and the exponential factors in the right-hand member in- 
crease as x* increases. But we have previously shown that 2* de- 
creases as t increases. Therefore, the slope at the inflexion point in- 
creases for increasing t. 

The foregoing criteria for the progressive refinement of re- 
sponse, namely, the increasing curvature at the minimum threshold, 
the decreasing distance between the inflexion points of the threshold 
curve are, of course, not exhaustive. One might, for example, com- 
pare the minimum threshold with the average threshold as time pro- 
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gresses. Efficient learning would be characterized by a much more 
rapid change of the minimum threshold as compared with that of the 
average. 

In the foregoing analysis we assumed that the threshold de- 
creased linearly with the activity of the neuron. One can, of course, 
make other assumptions such as exponential decay. Thus 


h(«,t) =h, Exp {—E(x)N(a,t)}. (43) 


As in the preceding case we have 
vt 7 
N(a,t)= | [I(x) —h(x ,t)] dt. 


This leads to the differential equation 
dn/dt =— EIh + Eh? (44) 
and to the solution 
Tivo 


h=——_—__.. (45) 
lige (ls ha yer" 


As in the first case, decreases in time and its rate of decrease 
is most rapid where HF is largest. 

Another modification can be introduced into the picture by as- 
suming that the thresholds have a tendency to return to their origi- 
nal values. As a first approximation we may suppose that this ten- 
dency is proportional to the difference between the original value of 
the threshold and its mean value at a given moment. The differential 
equation for this case would be 


dh/dt = dh,/dt + a(h) —h), (46) 
where dh,/dt = — EIh + Eh? as in the preceding case. Equatio 


(46) leads to the solution 


Seeley ii e°'F (t) dt —e-*tahy + hoe, (47) 


0 


where 
ELh, (I s ho) exit 


F(t) =—— = ‘ 
[ho + (I — ho) e*!*]? 


(48) 
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Generalizations of the Model. 


The foregoing model depended upon the responses of a “success 
center’ to the removal of noxious stimuli. This model can be easily 
generalized to one where both “pleasant” and “unpleasant” stimuli 
are taken into consideration. 

Let now the linear array be innervated by two groups of recep- 
tors responding respectively to pleasant stimuli S, and unpleasant 
stimuli S,. As in the previous model, tracts from S,, inhibit the 
success center. The tracts from S,, however, excite the success cen- 
ter. Now if some motor activity stimulated by some neurons in the 
linear array manipulates the environment in such a way as to in- 
crease the intensity of a pleasant stimulus S,, this will result in a 
greater activity in the success center. This in turn lowers the thresh- 
olds of the motor neurons in the linear array which were involved in 
the “successful” activity. Thus by the same reasoning as in the fore- 
going case the learning of ‘successful’ responses to pleasant stimuli 
is accounted for. 

An obvious shortcoming of the models discussed here is the fact 
that in all of them not only the threshold of the “correct”? neurons 
decrease, but, also, the thresholds of all the “wrong” neurons. This 
difficulty can be overcome by a mathematical artifact by allowing 
E(x) to assume negative values. For such loci, «, where H(x) is 
negative, we would then have the threshold increase with time as the 
inspection of equation (22) shows. This artifact is not applicable in 
our case. Our model is based on the supposition that the activity of 
the success center always tends to lower the thresholds in the linear 
array. Negative E(x), therefore, would have no physical meaning, 
as a glance at equation (17) shows. The objection can nevertheless 
be met by the following considerations. 

The rate of decrease of the thresholds of those neurons involved 
in the most “successful” activity will be the most rapid as the fore- 
going mathematical analysis has shown. Hence the average thresh- 
old h of the “best” neurons will reach S, the lowest level permitted 
by our equations, before any of the others. This imposed lower bound 
on h, however, was dictated solely by convenience. It was made to 
avoid cumbersome discontinuities which would otherwise appear in 
equations (16) and (18). We can, however, extrapolate the results 
of our mathematical analysis by a qualitative argument. 

Once the average threshold of a neuron in the linear array 
reaches the stimulus level, it may be expected to fire almost every 
time the stimulus is presented, provided the latter is of sufficient 
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duration. Its threshold will thereby be lowered even further. Its 
responses to the stimulus will become even more certain until a situa- 
tion will prevail where only rare fluctuations of the threshold will be 
of sufficient magnitude and proper direction to prevent its firing. 

In the meantime another effect will have entered the picture. 


While the average thresholds were still sufficiently high, the response- 
time included the expected time of occurrence of a sufficiently large 
fluctuation of threshold in the responding motor neuron. This means 
that in the beginning of the learning process the response times are, 
on the average, comparatively long. When, however, the thresholds 
of the “best” neurons, as described above, reach very low levels their 
response-times become very brief. This practically eliminates the re- 
sponses of the other neurons since the response of the best neurons 
almost immediately eliminates the noxious stimuli. Thus the thresh- 
old distribution in the linear array becomes practically static as soon 
as the organism has learned to respond specifically and invariably 
to a noxious stimulus. For the case of pleasant stimuli discussed in 
the generalized model, this argument is not valid. This is so because 
the pleasant stimuli are not removed but, on the contrary, increased 
by the responses and thus may bring additional neurons into play. 

A further generalization involves a ‘‘failure”’ center as well as 
the success center. Inhibitory tracts from the failure center inner- 
vate the linear array in much the same way as the success center. 
The effect of these inhibiting fibers, however, is to raise the thresh- 
olds of recently active neurons. The failure center is assumed to be 
excited by tracts leading from the receptors of noxious stimuli and 
inhibited by tracts leading from the receptors of pleasant stimuli. 
The function # (x) will now be assumed to take on both positive and 
negative values. The results of the activity of a neuron at x affect 
the failure center in a way opposite from their effect on the success 


center, i.e. the strongest effect is by those neurons where E(x) is 
minimal. 


Let us now consider what happens at the failure center as a re- 
sult of the activity of a neuron at x. If the response is “bad” i.e. 
E(x) is small (or negatively large), the failure center will be stimu- 
lated most. Its activity will then result in the greatest raising of the 
threshold of the “offending” neuron. Hence future responses of 
that neuron will tend to be eliminated. The effects of “good” neu- 
rons on F are negligible and hence their threshold will not tend to be 


significantly raised but will be lowered by the success center as in the 
previous case. 
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The inevitable general lowering of thresholds in the first model 
may be also avoided by supposing the existence of an additional mech- 
anism which tends to raise all thresholds indiscriminately. 


Further Implications. 


Suppose that the organism has been subjected to a given learn- 
ing situation. As we have seen, the effect of such an experience would 
be to change its h curve. 

If now the organism is subjected to a new learning situation in- 
volving the same learning mechanism, its new hf curve will now play 
the part of the original h,.(x) curve. Furthermore, the E(x) asso- 
ciated with the new situation will, in general, be different. The ques- 
tion now arises: What are the invariants of such learning processes? 
One of the goals of the mathematical biology of nervous systems is 
to interpret such rather hazy terms as flexibility, memory, adapta- 
bility, learning transference, intelligence, abstraction ability, etc. in 
terms of parameters such as those discussed here. The notion of 
special ability, for example, to which we referred in connection with 
the “ganglion-brain,” was interpreted in terms of biases in the con- 
nectivity of the net. We see that in the linear array model this no- 
tion assumes a somewhat more definite meaning. It may specifically 
refer to the distribution of connections from the success center to 
the linear array. 


Alternative Interpretations of the Model. 

It should be emphasized once again that our linear array is a 
mere mathematical artifact. Its only purpose is to “order” the neu- 
rons governing various responses in accordance with the effective- 
ness of the response. In particular, the loci of the array, which are 
in no way to be interpreted as geographical loci, may be considered 
as groups of neurons governing various patterns of response. These 
patterns may be simultaneous configurations of active motor organs 
or, even more generally, temporal sequences of such configurations. 
The flexion of a limb, the act of speaking, or even sequences of 
“thought” may be regarded as examples. The learning process then, 
according to this model, appears as a selective narrowing down to the 
optimal pattern of response. 
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